Index of papers in Proc. ACL 2010 that mention

CRF

Seen in text as:

CRF (118)
CRF++ (7)

Seen in 112 sentences in 9 papers.

1. Conditional Random Fields for Word Hyphenation

Trogkanis, Nikolaos and Elkan, Charles

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional random fields	Training a CRF means
Conditional random fields	The software we use as an implementation of conditional random fields is named CRF++ (Kudo, 2007).
Conditional random fields	We adopt the default parameter settings of CRF++ , so no development set or tuning set is needed in our work.
Experimental design	These are learning experiments so we also use tenfold cross validation in the same way as with CRF++ .
Experimental results	Figure 1 shows how the error rate is affected by increasing the CRF probability threshold for each language.
Experimental results	For the English language, the CRF using the Viterbi path has overall error rate of 0.84%, compared to 6.81% for the TEX algorithm using American English patterns, which is eight times worse.
Experimental results	However, the serious error rate for the CRF is less good: 0.41% compared to 0.24%.

CRF is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

error rate (28)
CRF (27)
word-level (5)

2. Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

Yang, Dong and Dixon, Paul and Furui, Sadaoki

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper presents a joint optimization method of a two-step conditional random field ( CRF ) model for machine transliteration and a fast decoding algorithm for the proposed method.
Abstract	In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language.
Introduction	In the “NEWS 2009 Machine Transliteration Shared Task”, a new two-step CRF model for transliteration task has been proposed (Yang et al., 2009), in which the first step is to segment a word in the source language into character chunks and the second step is to perform a context-dependent mapping from each chunk into one written unit in the target language.
Introduction	In this paper, we propose to jointly optimize a two-step CRF model.
Introduction	The rest of this paper is organized as follows: Section 2 explains the two-step CRF method, followed by Section 3 which describes our joint optimization method and its fast decoding algorithm; Section 4 introduces a rapid implementation of a J SCM system in the weighted finite state transducer (WFST) framework; and the last section reports the experimental results and conclusions.
Two-step CRF method	2.1 CRF introduction
Two-step CRF method	CT. CRF training is usually performed through the L-BFGS algorithm (Wal-lach, 2002) and decoding is performed by the Viterbi algorithm.
Two-step CRF method	We formalize machine transliteration as a CRF tagging problem, as shown in Figure 2.

CRF is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

3. Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

Finkel, Jenny Rose and Manning, Christopher D.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Base Models	Semi-CRFs segment and label the text simultaneously, whereas a linear-chain CRF will only label each word, and segmentation is implied by the labels assigned to the words.
Base Models	doing named entity recognition, a semi-CRF will have one node for each entity, unlike a regular CRF which will have one node for each word.2 See Figure 3ab for an example of a semi-CRF and a linear-chain CRF over the same sentence.
Base Models	Note that the entity Hilary Clinton has one node in the semi-CRF representation, but two nodes in the linear-chain CRF .
Experiments and Discussion	For each section of the data (ABC, MNB, NBC, PRI, VOA) we ran experiments training a linear-chain CRF on only the named entity information, a CRF-CFG parser on only the parse information, a joint parser and named entity recognizer, and our hierarchical model.
Hierarchical Joint Learning	Figure 3: A linear-chain CRF (a) labels each word, whereas a semi-CRF (b) labels entire entities.

CRF is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

4. Learning 5000 Relational Extractors

Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Extraction with Lexicons	Domain-independence requires access to an extremely large number of lists, but our tight integration of lexicon acquisition and CRF learning requires that relevant lists be accessed instantaneously.
Extraction with Lexicons	While training a CRF extractor for a given relation, LUCHS uses its corpus of lists to automatically generate a set of semantic lexicons — specific to that relation.
Extraction with Lexicons	The semantic lexicons are added as features to the CRF learning algorithm.
Introduction	These lexicons form Boolean features which, along with lexical and dependency parser-based features, are used to produce a CRF extractor for each relation — one which performs much better than lexicon-free extraction on sparse training data.
Learning Extractors	We use a linear-chain conditional random field ( CRF ) — an undirected graphical model connecting a sequence of input and output random variables, cc 2 (x0, .
Learning Extractors	The CRF models are represented with a log-linear distribution

CRF is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

CRF (10)
F1 score (8)
overfitting (6)

5. Open Information Extraction Using Wikipedia

Wu, Fei and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	WOE can run in two modes: a CRF extractor (WOEPOS) trained with shallow features like POS tags; a pattern classfier (WOEparse) learned from dependency path patterns.
Experiments	To compare with TextRunner, we tested four different ways to generate training examples from Wikipedia for learning a CRF extractor.
Experiments	The CRF extractors are trained using the same learning algorithm and feature selection as TextRunner.
Related Work	Wu and Weld proposed the KYLIN system (Wu and Weld, 2007; Wu et al., 2008) which has the same spirit of matching Wikipedia sentences with infoboxes to learn CRF extractors.
Wikipedia-based Open IE	[ Primary Entity Matching ] - Sentence—Matching MatCher K Pattern Classifier over Parser Features ] Learner CRF Extractor over Shallow Features
Wikipedia-based Open IE	In contrast, WOEPOS (like TextRunner) trains a conditional random field ( CRF ) to output certain text between noun phrases when the text denotes such a relation.
Wikipedia-based Open IE	Since high speed can be crucial when processing Web-scale corpora, we additionally learn a CRF extractor WOEPOS based on shallow features like POS-tags.

CRF is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Open-Domain Semantic Role Labeling by Modeling Word Spans

Huang, Fei and Yates, Alexander

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	To address the task of open-domain predicate identification, we construct a Conditional Random Field ( CRF ) (Lafferty et al., 2001) model with target labels of B-Pred, I-Pred, and O-Pred (for the beginning, interior, and outside of a predicate).
Introduction	We use an open source CRF software package to implement our CRF models.1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system.
Introduction	We refer to the CRF model with these features as our Baseline SRL system; in what follows we extend the Baseline model with more sophisticated features.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

7. A Risk Minimization Framework for Extractive Speech Summarization

Lin, Shih-Hsiang and Chen, Berlin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	To this end, several popular machine-leaming methods could be utilized, like Bayesian classifier (BC) (Kupiec et al., 1999), Gaussian mixture model (GMM) (Fattah and Ren, 2009) , hidden Markov model (HMM) (Conroy and O'leary, 2001), support vector machine (SVM) (Kolcz et al., 2001), maximum entropy (ME) (Ferrier, 2001), conditional random field ( CRF ) (Galley, 2006; Shen et al., 2007), to name a few.
Background	Although such supervised summarizers are effective, most of them (except CRF ) usually implicitly assume that sentences are independent of each other (the so-called “bag-ofsentences” assumption) and classify each sentence individually without leveraging the relationship among the sentences (Shen et al., 2007).
Experimental results and discussions 6.1 Baseline experiments	In the final set of experiments, we compare our proposed summarization methods with a few existing summarization methods that have been widely used in various summarization tasks, including LEAD, VSM, LexRank and CRF ; the corresponding results are shown in Table 5.
Experimental results and discussions 6.1 Baseline experiments	To our surprise, CRF does not provide superior results as compared to the other summarization methods.
Experimental results and discussions 6.1 Baseline experiments	One possible explanation is that the structural evidence of the spoken documents in the test set is not strong enough for CRF to show its advantage of modeling the local structural information among sentences.

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

loss function (14)
LM (8)
CRF (6)

8. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering-based word representations	However, the CRF chunker in Huang and Yates (2009), which uses their HMM word clusters as extra features, achieves F1 lower than
Clustering-based word representations	a baseline CRF chunker (Sha & Pereira, 2003).
Supervised evaluation tasks	The linear CRF chunker of Sha and Pereira (2003) is a standard near-state-of—the-art baseline chunker.
Supervised evaluation tasks	In fact, many off-the-shelf CRF implementations now replicate Sha and Pereira (2003), including their choice of feature set:
Supervised evaluation tasks	0 CRF++ by Taku Kudo (http://crfpp.

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

9. Practical Very Large Scale CRFs

Lavergne, Thomas and Cappé, Olivier and Yvon, François

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional Random Fields	Hence, any numerical optimization strategy may be used and practical solutions include limited memory BFGS (L-BFGS) (Liu and Nocedal, 1989), which is used in the popular CRF++ (Kudo, 2005) and CRFsuite (Okazaki, 2007) packages; conjugate gradient (Nocedal and Wright, 2006) and Stochastic Gradient Descent (SGD) (Bottou, 2004; Vishwanathan et al., 2006), used in CRFsgd (Bottou, 2007).
Conditional Random Fields	CRF++ and CRFsgd perform the forward-backward recursions in the log-domain, which guarantees that numerical over/underflows are avoided no matter the length T (i) of the sequence.
Conditional Random Fields	The CRF package developed in the course of this study implements many algorithmic optimizations and allows to design innovative training strategies, such as the one presented in section 5.4.

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: