Index of papers in Proc. ACL that mention

CRF

Seen in text as:

CRF (655)
CRF++ (12)
+CRF (4)

Seen in 614 sentences in 58 papers.

1. Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition

Arnold, Andrew and Nallapati, Ramesh and Cohen, William W.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	§2 introduces the maximum entropy (maxent) and conditional random field ( CRF ) learning techniques employed, along with specifications for the design and training of our hierarchical prior.
Investigation	Specifically, we compared our approximate hierarchical prior model (HIER), implemented as a CRF, against three baselines: o GAUSS: CRF model tuned on a single domain’s data, using a standard N(0, 1) prior 0 CAT: CRF model tuned on a concatenation of multiple domains’ data, using a N(0, 1) prior 0 CHELBA: CRF model tuned on one domain’s data, using a prior trained on a different, related domain’s data (cf.
Investigation	Line a shows the F1 performance of a CRF model tuned only on the target MUC6 domain (GAUSS) across a range of tuning data sizes.
Investigation	Line I) shows the same experiment, but this time the CRF model has been tuned on a dataset comprised of a simple concatenation of the training MUC6 data from (a), along with a different training set from MUC7 (CAT).
Models considered 2.1 Basic Conditional Random Fields	The parametric form of the CRF for a sentence of length n is given as follows:
Models considered 2.1 Basic Conditional Random Fields	CRF learns a model consisting of a set of weights A = {A1...)\F} over the features so as to maximize the conditional likelihood of the training data, p(lémm\|Xtmm), given the model p A.
Models considered 2.1 Basic Conditional Random Fields	2.2 CRF with Gaussian priors

CRF is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

Clifton, Ann and Sarkar, Anoop

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models 2.1 Baseline Models	A conditional random field ( CRF ) (Lafferty et al., 2001) defines the conditional probability as a linear score for each candidate y and a global normalization term:
Models 2.1 Baseline Models	However, the output 31* from the CRF decoder is still only a sequence of abstract suffix tags.
Models 2.1 Baseline Models	The abstract suffix tags are extracted from the unsupervised morpheme learning process, and are carefully designed to enable CRF training and decoding.

CRF is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

translation model (21)
BLEU (17)
CRF (12)

3. Recognizing Named Entities in Tweets

LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields ( CRF ) model under a semi-supervised learning framework to tackle these challenges.
Abstract	The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet.
Introduction	Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear Conditional Random Fields ( CRF ) (Lafferty et al., 2001) model, which conducts fine-grained tweet level NER.
Introduction	Furthermore, the KNN and CRF model are repeatedly retrained with an incrementally augmented training set, into which high confidently labeled tweets are added.
Introduction	Indeed, it is the combination of KNN and CRF under a semi-supervised learning framework that differentiates ours from the existing.
Related Work	(2010) use Amazons Mechanical Turk service 2 and CrowdFlower 3 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling.
Related Work	To achieve this, a KNN classifier with a CRF model is combined to leverage cross tweets information, and the semi-supervised learning is adopted to leverage unlabeled tweets.
Related Work	(2005) use CRF to train a sequential NE labeler, in which the BIO (meaning Beginning, the Inside and the Outside of

CRF is mentioned in 35 sentences in this paper.

Topics mentioned in this paper:

CRF (35)
NER (32)
semi-supervised (24)

4. Event Discovery in Social Media Feeds

Benson, Edward and Haghighi, Aria and Barzilay, Regina

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	+LowThresh +CRF +List *Our Work +Our Work+Con
Evaluation	The CRF and hard-constrained consensus lines terminate because of low record yield.
Evaluation Setup	+LowThresh +CRF +List -)'(-OurWork
Evaluation Setup	The CRF lines terminate because of low record yield.
Evaluation Setup	Our List Baseline labels messages by finding string overlaps against a list of musical artists and venues scraped from web data (the same lists used as features in our CRF component).
Inference	Since a uniform initialization of all factors is a saddle-point of the objective, we opt to initialize the q(y) factors with the marginals obtained using just the CRF parameters, accomplished by running forwards-backwards on all messages using only the
Inference	To do so, we run the CRF component of our model (ngEQ) over the corpus and extract, for each 6, all spans that have a token-level probability of being labeled 6 greater than A = 0.1.
Introduction	We bias local decisions made by the CRF to be consistent with canonical record values, thereby facilitating consistency within an event cluster.
Model	The sequence labeling factor is similar to a standard sequence CRF (Lafferty et al., 2001), Where the potential over a message label sequence decomposes
Model	The weights of the CRF component of our model, QSEQ, are the only weights learned at training time, using a distant supervision process described in Section 6.

CRF is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

5. Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization

Chan, Wen and Zhou, Xiangdong and Wang, Wei and Chua, Tat-Seng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In order to automatically generate a novel and non-redundant community answer summary, we segment the complex original multi-sentence question into several sub questions and then propose a general Conditional Random Field ( CRF ) based answer summary method with group L1 regularization.
Introduction	We tackle the answer summary task as a sequential labeling process under the general Conditional Random Fields ( CRF ) framework: every answer sentence in the question thread is labeled as a summary sentence or non-summary sentence, and we concatenate the sentences with summary label to form the final summarized answer.
Introduction	First, we present a general CRF based framework
Introduction	Second, we propose a group L1-regularization approach in the CRF model for automatic optimal feature learning to unleash the potential of the features and enhance the performance of answer summarization.
The Summarization Framework	Then under CRF (Lafferty et al., 2001), the conditional probability of y given X obeys the following distribution: p(ylx) = 22mm 2 Mgl<v,y\|.,x>
The Summarization Framework	Therefore, to explore the optimal combination of these features, we propose a group L1 regularization term in the general CRF model (Section 3.3) for feature learning.
The Summarization Framework	These sentence-level features can be easily utilized in the CRF framework.

CRF is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

CRF (21)
sentence-level (6)
SVM (6)

6. Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

Yang, Dong and Dixon, Paul and Furui, Sadaoki

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper presents a joint optimization method of a two-step conditional random field ( CRF ) model for machine transliteration and a fast decoding algorithm for the proposed method.
Abstract	In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language.
Introduction	In the “NEWS 2009 Machine Transliteration Shared Task”, a new two-step CRF model for transliteration task has been proposed (Yang et al., 2009), in which the first step is to segment a word in the source language into character chunks and the second step is to perform a context-dependent mapping from each chunk into one written unit in the target language.
Introduction	In this paper, we propose to jointly optimize a two-step CRF model.
Introduction	The rest of this paper is organized as follows: Section 2 explains the two-step CRF method, followed by Section 3 which describes our joint optimization method and its fast decoding algorithm; Section 4 introduces a rapid implementation of a J SCM system in the weighted finite state transducer (WFST) framework; and the last section reports the experimental results and conclusions.
Two-step CRF method	2.1 CRF introduction
Two-step CRF method	CT. CRF training is usually performed through the L-BFGS algorithm (Wal-lach, 2002) and decoding is performed by the Viterbi algorithm.
Two-step CRF method	We formalize machine transliteration as a CRF tagging problem, as shown in Figure 2.

CRF is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

7. Sentence Dependency Tagging in Online Question Answering Forums

Qu, Zhonghua and Liu, Yang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We use linear-chain conditional random fields (CRF) for sentence type tagging, and a 2D CRF to label the dependency relation between sentences.
Introduction	We use linear-chain conditional random fields ( CRF ) to take advantage of many long-distance and nonlocal features.
Introduction	First each sentence is considered as a source, and we run a linear-chain CRF to label whether each of the other sentences is its target.
Introduction	Because multiple runs of separate linear-chain CRFs ignore the dependency between source sentences, the second approach we propose is to use a 2D CRF that models all pair relationships jointly.
Related Work	In (Ding et al., 2008), a two-pass approach was used to find relevant solutions for a given question, and a skip-chain CRF was adopted to model long range de-
Thread Structure Tagging	Linear-chain CRF is a special case of the general CRFs.
Thread Structure Tagging	In linear-chain CRF , cliques only involve two adjacent variables in the sequence.
Thread Structure Tagging	Figure 3 shows the graphical structure of a linear-chain CRF .

CRF is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

8. Conditional Random Fields for Word Hyphenation

Trogkanis, Nikolaos and Elkan, Charles

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional random fields	Training a CRF means
Conditional random fields	The software we use as an implementation of conditional random fields is named CRF++ (Kudo, 2007).
Conditional random fields	We adopt the default parameter settings of CRF++ , so no development set or tuning set is needed in our work.
Experimental design	These are learning experiments so we also use tenfold cross validation in the same way as with CRF++ .
Experimental results	Figure 1 shows how the error rate is affected by increasing the CRF probability threshold for each language.
Experimental results	For the English language, the CRF using the Viterbi path has overall error rate of 0.84%, compared to 6.81% for the TEX algorithm using American English patterns, which is eight times worse.
Experimental results	However, the serious error rate for the CRF is less good: 0.41% compared to 0.24%.

CRF is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

error rate (28)
CRF (27)
word-level (5)

9. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	It initially starts with training supervised Conditional Random Fields ( CRF ) (Lafferty et al., 2001) on the source training data which has been semantically tagged.
Introduction	Our SSL uses MTR to smooth the semantic tag posteriors on the unlabeled target data (decoded using the CRF model) and later obtains the best tag sequences.
Introduction	While retrospective learning itera-tively trains CRF models with the automatically annotated target data (explained above), it keeps track of the errors of the previous iterations so as to carry the properties of both the source and target domains.
Markov Topic Regression - MTR	We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features.
Related Work and Motivation	We present a retrospective SSL for CRF , in that, the iterative learner keeps track of the errors of the previous iterations so as to carry the properties of both the source and target domains.
Semi-Supervised Semantic Labeling	4.1 Semi Supervised Learning (SSL) with CRF
Semi-Supervised Semantic Labeling	They decode unlabeled queries from target domain (t) using a CRF model trained on the POS-labeled newswire data (source domain (0)).
Semi-Supervised Semantic Labeling	Since CRF tagger only uses local features of the input to score tag pairs, they try to capture all the context with the graph with additional context features on types.

CRF is mentioned in 32 sentences in this paper.

Topics mentioned in this paper:

10. Conditional Random Fields for Responsive Surface Realisation using Global Features

Dethlefs, Nina and Hastie, Helen and Cuayáhuitl, Heriberto and Lemon, Oliver

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cohesion across Utterances	This paper focuses on surface realisation from these trees using a CRF as shown in the surface realisation module.
Cohesion across Utterances	As shown in the architecture diagram in Figure 1, a CRF surface realiser takes a semantic tree as input.
Cohesion across Utterances	Figure 2: (a) Graphical representation of a linear-chain Conditional Random Field (CRF), where empty nodes correspond to the labelled sequence, shaded nodes to linguistic observations, and dark squares to feature functions between states and observations; (b) Example semantic trees that are updated at each time step in order to provide linguistic features to the CRF (only one possible surface realisation is shown and parse categories are omitted for brevity); (c) Finite state machine of phrases (labels) for this example.
Introduction	Our main hypothesis is that the use of global context in a CRF with semantic trees can lead to surface realisations that are better phrased, more natural and less repetitive than taking only local features into account.

CRF is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

11. Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual NER by Agreement	The English-side CRF model assigns the following probability for a tag sequence ye:
Bilingual NER by Agreement	where V6 is the set of vertices in the CRF and De is the set of edges.
Bilingual NER by Agreement	2mm) and w(vi,vj) are the node and edge clique potentials, and Ze(e) is the partition function for input sequence e under the English CRF model.
Experimental Setup	We train the two CRF models on all portions of the OntoNotes corpus that are annotated with named entity tags, except the parallel-aligned portion which we reserve for development and test purposes.
Experimental Setup	1The exact feature set and the CRF implementation

CRF is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

NER (41)
word alignment (20)
CRF (15)

12. Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

Wang, Aobo and Kan, Min-Yen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	To benchmark the improvement that the factorial CRF model has by doing the two tasks jointly, we compare with a LCRF solution that chains these two tasks together.
Experiment	We note that the CRF based models achieve much higher precision score than baseline systems, which means that the CRF based models can make accurate predictions without enlarging the scope of prospective informal words.
Experiment	Compared with the CRF based models, the SVM and DT both over-predict informal words, incurring a larger precision penalty.
Introduction	Our techniques significantly outperform both research and commercial state-of-the-art for these problems, including two-step linear CRF baselines which perform the two tasks sequentially.
Methodology	2.2.1 Linear-Chain CRF
Methodology	A linear-chain CRF (LCRF; Figure 2a) predicts the output label based on feature functions provided by the scientist on the input.
Methodology	2.2.2 Factorial CRF
Related Work	This is a weakness as their linear CRF model requires retraining.

CRF is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

SVM (13)
baseline systems (11)
CRF (11)

13. Joint Inference for Fine-grained Opinion Extraction

Yang, Bishan and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Each extracts opinion entities first using the same CRF employed in our approach, and then predicts opinion relations on the opinion entity candidates obtained from the CRF prediction.
Experiments	We report results using opinion entity candidates from the best CRF output and from the merged lO-best CRF output.10 The motivation of merging the lO-best output is to increase recall for the pipeline methods.
Model	1We randomly split the training data into 10 parts and obtained the 50-best CRF predictions on each part for the generation of candidates.
Model	We also experimented with candidates generated from more CRF predictions, but did not find any performance improvement for the task.
Results	We compare our approach with the pipeline baselines and CRF (the first step of the pipeline).
Results	by adding the relation extraction step, the pipeline baselines are able to improve precision over the CRF but fail at recall.
Results	CRF+Syn and CRF+Adj provide the same performance as CRF , since the relation extraction step only affects the results of opinion arguments.

CRF is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

relation extraction (20)
CRF (14)
ILP (14)

14. Learning Soft Linear Constraints with Application to Citation Field Extraction

Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	For this underlying model, we employ a chain-structured conditional random field ( CRF ), since CRFs have been shown to perform better than other simple unconstrained models like hidden markov models for citation extraction (Peng and McCallum, 2004).
Background	The MAP inference task in a CRF be can expressed as an optimization problem with a lin-
Background	Since the log probability of some 3/ in the CRF is proportional to sum of the scores of all the factors, we can concatenate the indicator variables as a vector y and the scores as a vector 21) and write the MAP problem as
Citation Extraction Data	Here, 3/], represents an output tag of the CRF , SO if = = 1, then we have that 3/], was given a label with index i.
Citation Extraction Data	We constrain the output labeling of the chain-structured CRF to be a valid BIO encoding.
Citation Extraction Data	Rather than enforcing these constraints using dual decomposition, they can be enforced directly when performing MAP inference in the CRF by modifying the dynamic program of the Viterbi algorithm to only allow valid pairs of adj a-cent labels.
Soft Constraints in Dual Decomposition	Since we truncate penalties at 0, this suggests that we will learn a penalty of 0 for constraints in three categories: constraints that do not hold in the ground truth, constraints that hold in the ground truth but are satisfied in practice by performing inference in the base CRF model, and constraints that are satisfied in practice as a side-effect of imposing nonzero penalties on some other constraints .

CRF is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

15. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We investigate the use of smoothing in two test systems, conditional random field ( CRF ) models for POS tagging and chunking.
Experiments	Finally, we train the CRF model on the annotated training set and apply it to the test set.
Experiments	We use an open source CRF software package designed by Sunita Sajarwal and William W. Cohen to implement our CRF models.1 We use a set of boolean features listed in Table 1.

CRF is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

16. Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria

Druck, Gregory and Mann, Gideon and McCallum, Andrew

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Comparison with Unsupervised Learning	We compare GE and supervised training of an edge-factored CRF with unsupervised learning of a DMV model (Klein and Manning, 2004) using EM and contrastive estimation (CE) (Smith and Eisner, 2005).
Experimental Comparison with Unsupervised Learning	We note that there are considerable differences between the DMV and CRF models.
Experimental Comparison with Unsupervised Learning	The DMV model is more expressive than the CRF because it can model the arity of a head as well as sibling relationships.
Generalized Expectation Criteria	In the following sections we apply GE to non-projective CRF dependency parsing.
Generalized Expectation Criteria	We first consider an arbitrarily structured conditional random field (Lafferty et al., 2001) p)‘ (y We describe the CRF for non-projective dependency parsing in Section 3.2.
Generalized Expectation Criteria	We now define a CRF p)‘ (y\|x) for unlabeled, non-projective5 dependency parsing.
Introduction	With GE we may add a term to the objective function that encourages a feature-rich CRF to match this expectation on unlabeled data, and in the process learn about related features.
Introduction	In this paper we use a non-projective dependency tree CRF (Smith and Smith, 2007).

CRF is mentioned in 30 sentences in this paper.

Topics mentioned in this paper:

17. A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing

Feng, Vanessa Wei and Hirst, Graeme

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To enhance the accuracy of the pipeline, we add additional constraints in the Viterbi decoding of the first CRF .
Bottom-up tree-building	Secondly, as a joint model, it is mandatory to use a dynamic CRF , for which exact inference is usually intractable or slow.
Bottom-up tree-building	Figure 4a shows our intra-sentential structure model in the form of a linear-chain CRF .
Bottom-up tree-building	Thus, different CRF chains have to be formed for different pairs of constituents.
Features	In our local models, to encode two adjacent units, U j and U j+1, within a CRF chain, we use the following 10 sets of features, some of which are modified from J oty et al.’s model.
Introduction	Specifically, in the Viterbi decoding of the first CRF , we include additional constraints elicited from common sense, to make more effective local decisions.
Related work	(2013) approach the problem of text-level discourse parsing using a model trained by Conditional Random Fields ( CRF ).

CRF is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

18. Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization

Yang, Bishan and Cardie, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited.
Approach	The CRF model the following conditional probabilities:
Approach	The objective function for a standard CRF is to maximize the log-likelihood over a collection of labeled doc-
Approach	Most of our constraints can be factorized in the same way as factorizing the model features in the first-order CRF model, and we can compute the expectations under q very efficiently using the forward-backward algorithm.
Experiments	We trained our model using a CRF incorporated with the proposed posterior constraints.
Experiments	For the CRF features, we include the tokens, the part-of-speech tags, the prior polarities of lexical patterns indicated by the opinion lexicon and the negator lexicon, the number of positive and negative tokens and the output of the voteflip algorithm (Choi and Cardie, 2009).
Experiments	We set the CRF regularization parameter a = l and set the posterior regularization parameter 6 and y (a tradeoff parameter we introduce to balance the supervised objective and the posterior regularizer in 2) by using grid search 8.
Introduction	Specifically, we use the Conditional Random Field (CRF) model as the learner for sentence-level sentiment classification, and incorporate rich discourse and lexical knowledge as soft constraints into the learning of CRF parameters via Posterior Regularization (PR) (Ganchev et al., 2010).
Introduction	Unlike most previous work, we explore a rich set of structural constraints that cannot be naturally encoded in the feature-label form, and show that such constraints can improve the performance of the CRF model.

CRF is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

19. Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums

Ding, Shilin and Cong, Gao and Lin, Chin-Yew and Zhu, Xiaoyan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Context and Answer Detection	Finally, we will briefly introduce CRF models and the features that we used for CRF model.
Context and Answer Detection	A CRF is an undirected graphical model G of the conditional distribution P(Y\|X).
Context and Answer Detection	Linear CRF model has been successfully applied in NLP and text mining tasks (McCallum and Li, 2003; Sha and Pereira, 2003).
Introduction	To capture the dependency between contexts and answers, we introduce Skip-chain CRF model for answer detection.
Introduction	Experimental results show that 1) Linear CRFs outperform SVM and decision tree in both context and answer detection; 2) Skip-chain CRFs outperform Linear CRFs for answer finding, which demonstrates that context improves answer finding; 3) 2D CRF model improves the performance of Linear CRFs and the combination of 2D CRFs and Skip-chain CRFs achieves better performance for context detection.

CRF is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

CRFs (45)
CRF (22)
SVM (9)

20. Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

Finkel, Jenny Rose and Manning, Christopher D.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Base Models	Semi-CRFs segment and label the text simultaneously, whereas a linear-chain CRF will only label each word, and segmentation is implied by the labels assigned to the words.
Base Models	doing named entity recognition, a semi-CRF will have one node for each entity, unlike a regular CRF which will have one node for each word.2 See Figure 3ab for an example of a semi-CRF and a linear-chain CRF over the same sentence.
Base Models	Note that the entity Hilary Clinton has one node in the semi-CRF representation, but two nodes in the linear-chain CRF .
Experiments and Discussion	For each section of the data (ABC, MNB, NBC, PRI, VOA) we ran experiments training a linear-chain CRF on only the named entity information, a CRF-CFG parser on only the parse information, a joint parser and named entity recognizer, and our hierarchical model.
Hierarchical Joint Learning	Figure 3: A linear-chain CRF (a) labels each word, whereas a semi-CRF (b) labels entire entities.

CRF is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

21. Learning 5000 Relational Extractors

Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Extraction with Lexicons	Domain-independence requires access to an extremely large number of lists, but our tight integration of lexicon acquisition and CRF learning requires that relevant lists be accessed instantaneously.
Extraction with Lexicons	While training a CRF extractor for a given relation, LUCHS uses its corpus of lists to automatically generate a set of semantic lexicons — specific to that relation.
Extraction with Lexicons	The semantic lexicons are added as features to the CRF learning algorithm.
Introduction	These lexicons form Boolean features which, along with lexical and dependency parser-based features, are used to produce a CRF extractor for each relation — one which performs much better than lexicon-free extraction on sparse training data.
Learning Extractors	We use a linear-chain conditional random field ( CRF ) — an undirected graphical model connecting a sequence of input and output random variables, cc 2 (x0, .
Learning Extractors	The CRF models are represented with a log-linear distribution

CRF is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

CRF (10)
F1 score (8)
overfitting (6)

22. Generating Synthetic Comparable Questions for News Articles

Rokhlenko, Oleg and Szpektor, Idan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Comparable Question Mining	Therefore, we decided to employ a Conditional Random Fields ( CRF ) tag-
Comparable Question Mining	ger (Lafferty et al., 2001) to the task, since CRF was shown to be state-of-the-art for sequential relation extraction (Mooney and Bunescu, 2005; Culotta et al., 2006; J indal and Liu, 2006).
Comparable Question Mining	This transformation helps us to design a simpler CRF than that of (Jindal and Liu, 2006), since our CRF utilizes the known positions of the target entities in the text.
Related Work	Our extraction of comparable relations falls within the field of Relation Extraction, in which CRF is a state-of-the-art method (Mooney and Bunescu, 2005; Culotta et al., 2006).

CRF is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

23. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Class-based Model of Agreement	We treat segmentation as a character-level sequence modeling problem and train a linear-chain conditional random field ( CRF ) model (Lafferty et al., 2001).
A Class-based Model of Agreement	Class-based Agreement Model t E T Set of morpho-syntactic classes 3 E S Set of all word segments 6569 Learned weights for the CRF-based segmenter 0mg Learned weights for the CRF-based tagger gbo, gbt CRF potential functions (emission and transition)
A Class-based Model of Agreement	For this task we also train a standard CRF model on full sentences with gold classes and segmentation.
Conclusion and Outlook	The model can be implemented with a standard CRF package, trained on existing treebanks for many languages, and integrated easily with many MT feature APIs.

CRF is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

24. The Tradeoffs Between Open and Traditional Relation Extraction

Banko, Michele and Etzioni, Oren

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Hybrid Relation Extraction	Due to the sequential nature of our RE task, H-CRF employs a CRF as the meta-leamer, as opposed to a decision tree or regression-based classifier.
Hybrid Relation Extraction	To obtain the probability at each position of a linear-chain CRF , the constrained forward-backward technique described in (Culotta and McCallum, 2004) is used.
Related Work	(2006) used a CRF for RE, yet their task differs greatly from open extraction.
Relation Extraction	Figure 1: Relation Extraction as Sequence Labeling: A CRF is used to identify the relationship, born in, between Kafka and Prague
Relation Extraction	The resulting set of labeled examples are described using features that can be extracted without syntactic or semantic analysis and used to train a CRF , a sequence model that learns to identify spans of tokens believed to indicate explicit mentions of relationships between entities.
Relation Extraction	The entity pair serves to anchor each end of a linear-chain CRF , and both entities in the pair are assigned a fixed label of ENT.

CRF is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

25. Open Information Extraction Using Wikipedia

Wu, Fei and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	WOE can run in two modes: a CRF extractor (WOEPOS) trained with shallow features like POS tags; a pattern classfier (WOEparse) learned from dependency path patterns.
Experiments	To compare with TextRunner, we tested four different ways to generate training examples from Wikipedia for learning a CRF extractor.
Experiments	The CRF extractors are trained using the same learning algorithm and feature selection as TextRunner.
Related Work	Wu and Weld proposed the KYLIN system (Wu and Weld, 2007; Wu et al., 2008) which has the same spirit of matching Wikipedia sentences with infoboxes to learn CRF extractors.
Wikipedia-based Open IE	[ Primary Entity Matching ] - Sentence—Matching MatCher K Pattern Classifier over Parser Features ] Learner CRF Extractor over Shallow Features
Wikipedia-based Open IE	In contrast, WOEPOS (like TextRunner) trains a conditional random field ( CRF ) to output certain text between noun phrases when the text denotes such a relation.
Wikipedia-based Open IE	Since high speed can be crucial when processing Web-scale corpora, we additionally learn a CRF extractor WOEPOS based on shallow features like POS-tags.

CRF is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

26. Using subcategorization knowledge to improve case prediction for translation to German

Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Previous work	For case prediction, they trained a CRF with access to lemmas and POS-tags within a given window.
Previous work	While the CRF used for case prediction in Fraser et al.
Translation pipeline	Morphological features are predicted on four separate CRF models, one for each feature.
Translation pipeline	Instead, we limit the power of the CRF model through experimenting with the removal of features, until we had a system that was robust to this problem.
Using subcategorization information	subject, object or modifier) are passed on to the system at the level of nouns and integrated into the CRF through the derived probabilities.
Using subcategorization information	The classification task of the CRF consists in predicting a sequence of labels: case values for NPsflDPs or no value otherwise, cf.
Using subcategorization information	In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple,

CRF is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

27. Learning to lemmatise Polish noun phrases

Radziszewski, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

CRF and features	The choice of CRF for sequence labelling was mainly influenced by its successful application to chunking of Polish (Radziszewski and Pawlaczek, 2012).
Evaluation	The performed evaluation assumed training of the CRF on the whole development set annotated with the induced transformations and then applying the trained model to tag the evaluation part with transformations.
Evaluation	We decided to implement both baseline algorithms using the same CRF model but trained on fabricated data.
Evaluation	Also, it turns out that the variation of the matching procedure using the ‘lem’ transformation (row labelled CRF lem) performs slightly worse than the procedure without this transformation (row CRF nolem).
Introduction	In this paper we present a novel approach to noun phrase lemmatisation where the main phase is cast as a tagging problem and tackled using a method devised for such problems, namely Conditional Random Fields ( CRF ).
Phrase lemmatisation as a tagging problem	Our idea is simple: by expressing phrase lemmatisation in terms of word-level transformations we can reduce the task to tagging problem and apply well known Machine Learning techniques that have been devised for solving such problems (e. g. CRF ).

CRF is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

28. Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information

Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abbreviator with Nonlocal Information	The DPLVM is a natural extension of the CRF model (see Figure 2), which is a special case of the DPLVM, with only one latent variable assigned for each label.
Introduction	Figure 2: CRF vs. DPLVM.
Recognition as a Generation Task	As can be seen in Table 6, using the latent variables significantly improved the performance (see DPLVM vs. CRF), and using the GI encoding improved the performance of both the DPLVM and the CRF .
Recognition as a Generation Task	Table 7 shows that the back-off method further improved the performance of both the DPLVM and the CRF model.
Results and Discussion	special case of the DPLVM is exactly the CRF (see Section 2.1), this case is hereinafter denoted as the CRF .
Results and Discussion	The results revealed that the latent variable model significantly improved the performance over the CRF model.
Results and Discussion	All of its top-l, top-2, and top-3 accuracies were consistently better than those of the CRF model.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

29. Open-Domain Semantic Role Labeling by Modeling Word Spans

Huang, Fei and Yates, Alexander

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	To address the task of open-domain predicate identification, we construct a Conditional Random Field ( CRF ) (Lafferty et al., 2001) model with target labels of B-Pred, I-Pred, and O-Pred (for the beginning, interior, and outside of a predicate).
Introduction	We use an open source CRF software package to implement our CRF models.1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system.
Introduction	We refer to the CRF model with these features as our Baseline SRL system; in what follows we extend the Baseline model with more sophisticated features.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

30. Distortion Model Considering Rich Context for Statistical Machine Translation

Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Proposed Method	We use a sequence discrimination technique based on CRF (Lafferty et al., 2001) to identify the label sequence that corresponds to the NP.
Proposed Method	There are two differences between our task and the CRF task.
Proposed Method	One difference is that CRF discriminates label sequences that consist of labels from all of the label candidates, whereas we constrain the label sequences to sequences where the label at the CP is C, the label at an NPC is N, and the labels between the CP and the NPC are I.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

31. Phrase Clustering for Discriminative Learning

Lin, Dekang and Wu, Xiaoyun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Named Entity Recognition	Conditional Random Fields ( CRF ) (Lafferty et.
Named Entity Recognition	We employed a linear chain CRF with L2 regularization as the baseline algorithm to which we added phrase cluster features.
Named Entity Recognition	The features in our baseline CRF classifier are a subset of the conventional features.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

32. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Algorithm	(2) under the log-loss results in a probabilistic model commonly known as a conditional random field ( CRF ) (Lafferty et al., 2001).
Discussion	Large-margin learning, using the Passive-Aggressive and Pegasos algorithms, has benefits over CRF learning for our task: It produces sparser models, is faster, and produces better lexical access results.
Experiments	4We use the term “CRF” since the learning algorithm corresponds to CRF learning, although the task is multiclass classification rather than a sequence or structure prediction task.
Experiments	CRF learning with the same features performs about 6% worse than the corresponding PA and Pegasos models.
Experiments	The single-threaded running time for PNDP+ and Pegasos/DP+ is about 40 minutes per epoch, measured on a dual-core AMD 2.4GHz CPU with 8GB of memory; for CRF, it takes about 100 minutes for each epoch, which is almost entirely because the weight vector 0 is less sparse with CRF learning.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

33. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approaches	We define a conditional random field ( CRF ) (Lafferty et al., 2001) for this task.
Approaches	This model extends the CRF model in Section 3.1 to include the projective syntactic dependency parse for a sentence.
Approaches	train our CRF models by maximizing conditional log-likelihood using stochastic gradient descent with an adaptive learning rate (AdaGrad) (Duchi et al., 2011) over mini-batches.
Experiments	Similarly, improving the non-convex optimization of our latent-variable CRF (Marginalized) may offer further gains.
Introduction	0 Simpler joint CRF for syntactic and semantic dependency parsing than previously reported.
Introduction	The joint models use a non-loopy conditional random field ( CRF ) with a global factor constraining latent syntactic edge variables to form a tree.

CRF is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

34. A Risk Minimization Framework for Extractive Speech Summarization

Lin, Shih-Hsiang and Chen, Berlin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	To this end, several popular machine-leaming methods could be utilized, like Bayesian classifier (BC) (Kupiec et al., 1999), Gaussian mixture model (GMM) (Fattah and Ren, 2009) , hidden Markov model (HMM) (Conroy and O'leary, 2001), support vector machine (SVM) (Kolcz et al., 2001), maximum entropy (ME) (Ferrier, 2001), conditional random field ( CRF ) (Galley, 2006; Shen et al., 2007), to name a few.
Background	Although such supervised summarizers are effective, most of them (except CRF ) usually implicitly assume that sentences are independent of each other (the so-called “bag-ofsentences” assumption) and classify each sentence individually without leveraging the relationship among the sentences (Shen et al., 2007).
Experimental results and discussions 6.1 Baseline experiments	In the final set of experiments, we compare our proposed summarization methods with a few existing summarization methods that have been widely used in various summarization tasks, including LEAD, VSM, LexRank and CRF ; the corresponding results are shown in Table 5.
Experimental results and discussions 6.1 Baseline experiments	To our surprise, CRF does not provide superior results as compared to the other summarization methods.
Experimental results and discussions 6.1 Baseline experiments	One possible explanation is that the structural evidence of the spoken documents in the test set is not strong enough for CRF to show its advantage of modeling the local structural information among sentences.

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

loss function (14)
LM (8)
CRF (6)

35. Semantic Tagging of Web Search Queries

Manshadi, Mehdi and Li, Xiao

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	As seen in this plot, when there is a small amount of training data, the parser performs better than the CRF module and parser+SVM module performs better than the other two.
Evaluation	With a large amount of training data, the CRF and parser almost have the same performance.
Evaluation	4 The CRF module also uses the lexical resources and regular expressions.
Introduction	MEMM and CRF are discriminative models; hence they are highly dependent on the training data.
Summary	Test N0 = 3000 P R F Q CRF 0.815 0.812 0.813 0.509 Parser 0.808 0.814 0.811 0.494

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

36. Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

Darwish, Kareem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline Arabic NER System	For the baseline system, we used the CRF++1 implementation of CRF sequence labeling with default parameters.
Cross-lingual Features	translation was capitalized or not respectively; and the weights were binned because CRF++ only takes nominal features.
Introduction	Conditional Random Fields ( CRF )) can often identify such indicative words.
Related Work	Benajiba and Rosso (2008) used CRF sequence labeling and incorporated many language specific features, namely POS tagging, base-phrase chunking, Arabic tokenization, and adjectives indicating nationality.
Related Work	(2008), they examined the same feature set on the Automatic Content Extraction (ACE) datasets using CRF
Related Work	The use of CRF sequence labeling for NER has shown success (McCallum and Li, 2003; Nadeau and Sekine, 2009; Benajiba and Rosso, 2008).

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

37. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Architecture	Based on our CRF word segmentation system, we can compute a probability for each segment.
System Architecture	Note that, although our model is a Markov CRF model, we can still use word features to learn word information in the training data.
System Architecture	For traditional implementation of CRF systems (e.g., the HCRF package), usually the edges features contain only the information of yi_1 and y, and without the information of

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

38. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering-based word representations	However, the CRF chunker in Huang and Yates (2009), which uses their HMM word clusters as extra features, achieves F1 lower than
Clustering-based word representations	a baseline CRF chunker (Sha & Pereira, 2003).
Supervised evaluation tasks	The linear CRF chunker of Sha and Pereira (2003) is a standard near-state-of—the-art baseline chunker.
Supervised evaluation tasks	In fact, many off-the-shelf CRF implementations now replicate Sha and Pereira (2003), including their choice of feature set:
Supervised evaluation tasks	0 CRF++ by Taku Kudo (http://crfpp.

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

39. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	P R F OOV CRF 87.8 85.7 86.7 57.1 NN 92.4 92.2 92.3 60.0 NN+Tag Embed 93.0 92.7 92.9 61.0 MMTNN 93.7 93.4 93.5 64.2
Experiment	We also compare our model with the CRF model (Lafferty et al., 2001), which is a widely used log-linear model for Chinese word segmentation.
Experiment	The input feature to the CRF model is simply the context characters (unigram feature) without any additional feature engineering.

CRF is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

40. Learning to Predict Distributions of Words Across Domains

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Domain Adaptation	Next, we train a CRF model using all features (i.e.
Domain Adaptation	Finally, the trained CRF model is applied to a target domain test sentence.
Experiments and Results	The L-BFGS (Liu and Nocedal, 1989) method is used to train the CRF and logistic regression models.
Experiments and Results	Specifically, in POS tagging, a CRF trained on source domain labeled sentences is applied to target domain test sentences, whereas in sentiment classification, a logistic regression classifier trained using source domain labeled reviews is applied to the target domain test reviews.
Related Work	Huang and Yates (2009) train a Conditional Random Field ( CRF ) tagger with features retrieved from a smoothing model trained using both source and target domain unlabeled data.

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

41. Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

Joty, Shafiq and Carenini, Giuseppe and Ng, Raymond and Mehdad, Yashar

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	In this paper, we have presented a novel discourse parser that applies an optimal parsing algorithm to probabilities inferred from two CRF models: one for intra-sentential parsing and the other for multi-sentential parsing.
Introduction	The CRF models effectively represent the structure and the label of a DT constituent jointly, and whenever possible, capture the sequential dependencies between the constituents.
Introduction	To cope with this limitation, our CRF models support a probabilistic bottom-up parsing
Parsing Models and Parsing Algorithm	Figure 62 A CRF as a multi—sentential parsing model.
Parsing Models and Parsing Algorithm	It becomes a CRF if we directly model the hidden (output) variables by conditioning its clique potential (or factor) gb on the observed (input) variables:

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

42. Automatic Coupling of Answer Extraction and Information Retrieval

Yao, Xuchen and Van Durme, Benjamin and Clark, Peter

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	6This is because the weights of unigram to trigram features in a loglinear CRF model is a balanced consequence for maximization.
Introduction	The QA system employs a linear chain Conditional Random Field ( CRF ) (Lafferty et al., 2001) and tags each token as either an answer (ANS) or not (0).
Introduction	With weights optimized by CRF training (Table 1), we can learn how answer features are correlated with question features.
Introduction	These features, whose weights are optimized by the CRF training, directly reflect what the most important answer types associated with each question type are.

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

43. Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We first tested a standalone MWE recognizer based on CRF .
Evaluation	The CRF recognizer relies on the software Wapiti6 (Lavergne et al., 2010) to train and apply the model, and on the software Unitex (Paumier, 2011) to apply lexical resources.
Evaluation	Table 3: MWE identification with CRF : base are the features corresponding to token properties and word n-grams.
MWE-dedicated Features	In order to deal with unknown words and special tokens, we incorporate standard tagging features in the CRF : lowercase forms of the words, word prefixes of length l to 4, word suffice of length l to 4, whether the word is capitalized, whether the token has a digit, whether it is an hyphen.
Two strategies, two discriminative models	For such a task, we used Linear chain Conditional Ramdom Fields ( CRF ) that are discriminative prob-

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

44. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Log-Linear Models	If the structure is a sequence, the model is called a linear-chain CRF model, and the marginal probabilities of the features and the partition function can be efficiently computed by using the forward-backward algorithm.
Log-Linear Models	If the structure is a tree, the model is called a tree CRF model, and the marginal probabilities can be computed by using the inside-outside algorithm.
Log-Linear Models	We evaluate the effectiveness our training algorithm using linear-chain CRF models and three NLP tasks: text chunking, named entity recognition, and POS tagging.

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

45. Joint Annotation of Search Queries

Bendersky, Michael and Croft, W. Bruce and Smith, David A.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The CRF model training in line (6) of the algorithm is implemented using CRF++ toolkit3.
Joint Query Annotation	Accordingly, we can directly use a superv1sed sequential probabilistic model such as CRF (Lafferty
Joint Query Annotation	In this CRF
Joint Query Annotation	It then produces a set of independent annotation estimates, which are jointly used, together with the ground truth annotations, to learn a CRF model for each annotation type.

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

46. In-domain Relation Discovery with Meta-constraints via Posterior Regularization

Chen, Harr and Benson, Edward and Naseem, Tahira and Barzilay, Regina

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	Comparison against Supervised CRF Our final set of experiments compares a semi-supervised version of our model against a conditional random field ( CRF ) model.
Results	The CRF model was trained using the same features as our model’s argument features.
Results	At the sentence level, our model compares very favorably to the supervised CRF .

CRF is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

47. Creating a manually error-tagged and shallow-parsed learner corpus

Nagata, Ryo and Whittaker, Edward and Sheinman, Vera

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

UK and XP stand for unknown and X phrase, respectively.	6“CRFTagger: CRF English POS Tagger,” Xuan-Hieu Phan, http: //crftagger .
UK and XP stand for unknown and X phrase, respectively.	Method Native Corpus Learner Corpus CRF 0.970 0.932 HMM 0.887 0.926
UK and XP stand for unknown and X phrase, respectively.	HMM CRF POS Freq.

CRF is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

48. Less Grammar, More Features

Hall, David and Durrett, Greg and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Formally, our model is a CRF where the features factor over anchored rules of a small backbone grammar, as shown in Figure 1.
Parsing Model	All of these past CRF parsers do also exploit span features, as did the structured margin parser of Taskar et al.
Surface Feature Framework	Recall that our CRF factors over anchored rules 7“, where each 7“ has identity rule(7“) and anchoring span(r).
Surface Feature Framework	As far as we can tell, all past CRF parsers have used “positive” features only.

CRF is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

49. A Discriminative Hierarchical Model for Fast Coreference at Large Scale

Wick, Michael and Singh, Sameer and McCallum, Andrew

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background: Pairwise Coreference	For higher accuracy, a graphical model such as a conditional random field ( CRF ) is constructed from the compatibility functions to jointly reason about the pairwise decisions (McCallum and Wellner, 2004).
Background: Pairwise Coreference	We now describe the pairwise CRF for coreference as a factor graph.
Background: Pairwise Coreference	Given the pairwise CRF , the problem of coreference is then solved by searching for the setting of the coreference decision variables that has the highest probability according to Equation 1 subject to the
Conclusion	Indeed, inference in the hierarchy is orders of magnitude faster than a pairwise CRF , allowing us to infer accurate coreference on

CRF is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

50. Joint Inference of Named Entity Recognition and Normalization for Tweets

Liu, Xiaohua and Zhou, Ming and Zhou, Xiangyang and Fu, Zhongyang and Wei, Furu

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	(2011) develop a system that exploits a CRF model to segment named
Related Work	A linear CRF model
Related Work	(2010) use Amazons Mechanical Turk service 3 and CrowdFlower 4 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling.
Related Work	(2011) rebuild the NLP pipeline for tweets beginning with POS tagging, through chunking, to NER, which first exploits a CRF model to segment named entities and then uses a distantly supervised approach based on LabeledLDA to classify named entities.

CRF is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

51. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Our work is similar to Jakob and Gurevych (2010) which proposed a Conditional Random Field ( CRF ) for cross-domain topic word extraction.
Introduction	We denote iSVM and iCRF the in-domain SVM and CRF classifiers in experiments, and compare our proposed methods,
Introduction	Cross-Domain CRF (Cross-CRF) we implement a cross-domain CRF algorithm proposed by (Jakob and Gurevych, 2010).

CRF is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

52. A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	The dataset from Clarke and Lapata (2008) is used to train the CRF and MaxEnt classifiers (Section 4).
Sentence Compression	The CRF model is built using the features shown in Table 3.
Sentence Compression	During inference, we find the maximally likely sequence Y according to a CRF with parameter 6 (Y = arg maXy/ P(Y’\|X;6)), while simultaneously enforcing the rules of Table 2 to reduce the hypothesis space and encourage grammatical compression.

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

53. Sentiment Relevance

Scheible, Christian and Schütze, Hinrich

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distant Supervision	Following this assumption, we train MaXEnt and Conditional Random Field ( CRF , (McCallum, 2002)) classifiers on the [6% of documents that have the lowest maximum flow values f, where k is a parameter which we optimize using the run count method introduced in Section 4.
Distant Supervision	Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM.
Distant Supervision	quence model, we train a CRF (line 6); however, the improvement over MaxEnt (line 4) is not significant.

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. Practical Very Large Scale CRFs

Lavergne, Thomas and Cappé, Olivier and Yvon, François

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional Random Fields	Hence, any numerical optimization strategy may be used and practical solutions include limited memory BFGS (L-BFGS) (Liu and Nocedal, 1989), which is used in the popular CRF++ (Kudo, 2005) and CRFsuite (Okazaki, 2007) packages; conjugate gradient (Nocedal and Wright, 2006) and Stochastic Gradient Descent (SGD) (Bottou, 2004; Vishwanathan et al., 2006), used in CRFsgd (Bottou, 2007).
Conditional Random Fields	CRF++ and CRFsgd perform the forward-backward recursions in the log-domain, which guarantees that numerical over/underflows are avoided no matter the length T (i) of the sequence.
Conditional Random Fields	The CRF package developed in the course of this study implements many algorithmic optimizations and allows to design innovative training strategies, such as the one presented in section 5.4.

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations

Kazama, Jun'ichi and Torisawa, Kentaro

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We generated the node and the edge features of a CRF model as described in Table 3 using these atomic features.
Experiments	To train CRF models, we used Taku Kudo’s CRF++ (ver.
Using Gazetteers as Features of NER	These annotated IOB tags can be used in the same way as other features in a CRF tagger.

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. Bootstrapping via Graph Propagation

Whitney, Max and Sarkar, Anoop

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Existing algorithms 3.1 Yarowsky	Note that this is a linear model similar to a conditional random field ( CRF ) (Lafferty et al., 2001) for unstructured multiclass problems.
Existing algorithms 3.1 Yarowsky	It uses a CRF (Lafferty et al., 2001) as the underlying supervised learner.
Existing algorithms 3.1 Yarowsky	It differs significantly from Yarowsky in two other ways: First, instead of only training a CRF it also uses a step of graph propagation between distributions over the n-grams in the data.

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. Word Segmentation of Informal Arabic with Domain Adaptation

Monroe, Will and Green, Spence and Manning, Christopher D.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Arabic Word Segmentation Model	A CRF model (Lafferty et a1., 2001) defines a distri-butionp(Y\|X; 6), where X = {$1, .
Arabic Word Segmentation Model	The model of Green and DeNero is a third-order (i.e., 4—gram) Markov CRF , employing the following indicator features:
Introduction	The model is an extension of the character-level conditional random field ( CRF ) model of Green and DeNero (2012).

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. Efficient, Feature-based, Conditional Random Field Parsing

Finkel, Jenny Rose and Kleeman, Alex and Manning, Christopher D.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	For example, in (Lafferty et al., 2001), when switching from a generatively trained hidden Markov model (HMM) to a discriminatively trained, linear chain, conditional random field ( CRF ) for part-of-speech tagging, their error drops from 5.7% to 5.6%.
Introduction	When they add in only a small set of orthographic features, their CRF error rate drops considerably more to 4.3%, and their out-of-vocabulary error rate drops by more than half.
The Model	We then define a conditional probability distribution over entire trees, using the standard CRF distribution, shown in (1).

CRF is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: