Index of papers in Proc. ACL 2010 that mention
  • CRF
Trogkanis, Nikolaos and Elkan, Charles
Conditional random fields
Training a CRF means
Conditional random fields
The software we use as an implementation of conditional random fields is named CRF++ (Kudo, 2007).
Conditional random fields
We adopt the default parameter settings of CRF++ , so no development set or tuning set is needed in our work.
Experimental design
These are learning experiments so we also use tenfold cross validation in the same way as with CRF++ .
Experimental results
Figure 1 shows how the error rate is affected by increasing the CRF probability threshold for each language.
Experimental results
For the English language, the CRF using the Viterbi path has overall error rate of 0.84%, compared to 6.81% for the TEX algorithm using American English patterns, which is eight times worse.
Experimental results
However, the serious error rate for the CRF is less good: 0.41% compared to 0.24%.
CRF is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
Yang, Dong and Dixon, Paul and Furui, Sadaoki
Abstract
This paper presents a joint optimization method of a two-step conditional random field ( CRF ) model for machine transliteration and a fast decoding algorithm for the proposed method.
Abstract
In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language.
Introduction
In the “NEWS 2009 Machine Transliteration Shared Task”, a new two-step CRF model for transliteration task has been proposed (Yang et al., 2009), in which the first step is to segment a word in the source language into character chunks and the second step is to perform a context-dependent mapping from each chunk into one written unit in the target language.
Introduction
In this paper, we propose to jointly optimize a two-step CRF model.
Introduction
The rest of this paper is organized as follows: Section 2 explains the two-step CRF method, followed by Section 3 which describes our joint optimization method and its fast decoding algorithm; Section 4 introduces a rapid implementation of a J SCM system in the weighted finite state transducer (WFST) framework; and the last section reports the experimental results and conclusions.
Two-step CRF method
2.1 CRF introduction
Two-step CRF method
CT. CRF training is usually performed through the L-BFGS algorithm (Wal-lach, 2002) and decoding is performed by the Viterbi algorithm.
Two-step CRF method
We formalize machine transliteration as a CRF tagging problem, as shown in Figure 2.
CRF is mentioned in 34 sentences in this paper.
Topics mentioned in this paper:
Finkel, Jenny Rose and Manning, Christopher D.
Base Models
Semi-CRFs segment and label the text simultaneously, whereas a linear-chain CRF will only label each word, and segmentation is implied by the labels assigned to the words.
Base Models
doing named entity recognition, a semi-CRF will have one node for each entity, unlike a regular CRF which will have one node for each word.2 See Figure 3ab for an example of a semi-CRF and a linear-chain CRF over the same sentence.
Base Models
Note that the entity Hilary Clinton has one node in the semi-CRF representation, but two nodes in the linear-chain CRF .
Experiments and Discussion
For each section of the data (ABC, MNB, NBC, PRI, VOA) we ran experiments training a linear-chain CRF on only the named entity information, a CRF-CFG parser on only the parse information, a joint parser and named entity recognizer, and our hierarchical model.
Hierarchical Joint Learning
Figure 3: A linear-chain CRF (a) labels each word, whereas a semi-CRF (b) labels entire entities.
CRF is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.
Extraction with Lexicons
Domain-independence requires access to an extremely large number of lists, but our tight integration of lexicon acquisition and CRF learning requires that relevant lists be accessed instantaneously.
Extraction with Lexicons
While training a CRF extractor for a given relation, LUCHS uses its corpus of lists to automatically generate a set of semantic lexicons — specific to that relation.
Extraction with Lexicons
The semantic lexicons are added as features to the CRF learning algorithm.
Introduction
These lexicons form Boolean features which, along with lexical and dependency parser-based features, are used to produce a CRF extractor for each relation — one which performs much better than lexicon-free extraction on sparse training data.
Learning Extractors
We use a linear-chain conditional random field ( CRF ) — an undirected graphical model connecting a sequence of input and output random variables, cc 2 (x0, .
Learning Extractors
The CRF models are represented with a log-linear distribution
CRF is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Wu, Fei and Weld, Daniel S.
Conclusion
WOE can run in two modes: a CRF extractor (WOEPOS) trained with shallow features like POS tags; a pattern classfier (WOEparse) learned from dependency path patterns.
Experiments
To compare with TextRunner, we tested four different ways to generate training examples from Wikipedia for learning a CRF extractor.
Experiments
The CRF extractors are trained using the same learning algorithm and feature selection as TextRunner.
Related Work
Wu and Weld proposed the KYLIN system (Wu and Weld, 2007; Wu et al., 2008) which has the same spirit of matching Wikipedia sentences with infoboxes to learn CRF extractors.
Wikipedia-based Open IE
[ Primary Entity Matching ] - Sentence—Matching MatCher K Pattern Classifier over Parser Features ] Learner CRF Extractor over Shallow Features
Wikipedia-based Open IE
In contrast, WOEPOS (like TextRunner) trains a conditional random field ( CRF ) to output certain text between noun phrases when the text denotes such a relation.
Wikipedia-based Open IE
Since high speed can be crucial when processing Web-scale corpora, we additionally learn a CRF extractor WOEPOS based on shallow features like POS-tags.
CRF is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Introduction
To address the task of open-domain predicate identification, we construct a Conditional Random Field ( CRF ) (Lafferty et al., 2001) model with target labels of B-Pred, I-Pred, and O-Pred (for the beginning, interior, and outside of a predicate).
Introduction
We use an open source CRF software package to implement our CRF models.1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system.
Introduction
We refer to the CRF model with these features as our Baseline SRL system; in what follows we extend the Baseline model with more sophisticated features.
CRF is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lin, Shih-Hsiang and Chen, Berlin
Background
To this end, several popular machine-leaming methods could be utilized, like Bayesian classifier (BC) (Kupiec et al., 1999), Gaussian mixture model (GMM) (Fattah and Ren, 2009) , hidden Markov model (HMM) (Conroy and O'leary, 2001), support vector machine (SVM) (Kolcz et al., 2001), maximum entropy (ME) (Ferrier, 2001), conditional random field ( CRF ) (Galley, 2006; Shen et al., 2007), to name a few.
Background
Although such supervised summarizers are effective, most of them (except CRF ) usually implicitly assume that sentences are independent of each other (the so-called “bag-ofsentences” assumption) and classify each sentence individually without leveraging the relationship among the sentences (Shen et al., 2007).
Experimental results and discussions 6.1 Baseline experiments
In the final set of experiments, we compare our proposed summarization methods with a few existing summarization methods that have been widely used in various summarization tasks, including LEAD, VSM, LexRank and CRF ; the corresponding results are shown in Table 5.
Experimental results and discussions 6.1 Baseline experiments
To our surprise, CRF does not provide superior results as compared to the other summarization methods.
Experimental results and discussions 6.1 Baseline experiments
One possible explanation is that the structural evidence of the spoken documents in the test set is not strong enough for CRF to show its advantage of modeling the local structural information among sentences.
CRF is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Clustering-based word representations
However, the CRF chunker in Huang and Yates (2009), which uses their HMM word clusters as extra features, achieves F1 lower than
Clustering-based word representations
a baseline CRF chunker (Sha & Pereira, 2003).
Supervised evaluation tasks
The linear CRF chunker of Sha and Pereira (2003) is a standard near-state-of—the-art baseline chunker.
Supervised evaluation tasks
In fact, many off-the-shelf CRF implementations now replicate Sha and Pereira (2003), including their choice of feature set:
Supervised evaluation tasks
0 CRF++ by Taku Kudo (http://crfpp.
CRF is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lavergne, Thomas and Cappé, Olivier and Yvon, François
Conditional Random Fields
Hence, any numerical optimization strategy may be used and practical solutions include limited memory BFGS (L-BFGS) (Liu and Nocedal, 1989), which is used in the popular CRF++ (Kudo, 2005) and CRFsuite (Okazaki, 2007) packages; conjugate gradient (Nocedal and Wright, 2006) and Stochastic Gradient Descent (SGD) (Bottou, 2004; Vishwanathan et al., 2006), used in CRFsgd (Bottou, 2007).
Conditional Random Fields
CRF++ and CRFsgd perform the forward-backward recursions in the log-domain, which guarantees that numerical over/underflows are avoided no matter the length T (i) of the sequence.
Conditional Random Fields
The CRF package developed in the course of this study implements many algorithmic optimizations and allows to design innovative training strategies, such as the one presented in section 5.4.
CRF is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: