Index of papers in Proc. ACL 2013 that mention
  • CRF
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Introduction
It initially starts with training supervised Conditional Random Fields ( CRF ) (Lafferty et al., 2001) on the source training data which has been semantically tagged.
Introduction
Our SSL uses MTR to smooth the semantic tag posteriors on the unlabeled target data (decoded using the CRF model) and later obtains the best tag sequences.
Introduction
While retrospective learning itera-tively trains CRF models with the automatically annotated target data (explained above), it keeps track of the errors of the previous iterations so as to carry the properties of both the source and target domains.
Markov Topic Regression - MTR
We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features.
Related Work and Motivation
We present a retrospective SSL for CRF , in that, the iterative learner keeps track of the errors of the previous iterations so as to carry the properties of both the source and target domains.
Semi-Supervised Semantic Labeling
4.1 Semi Supervised Learning (SSL) with CRF
Semi-Supervised Semantic Labeling
They decode unlabeled queries from target domain (t) using a CRF model trained on the POS-labeled newswire data (source domain (0)).
Semi-Supervised Semantic Labeling
Since CRF tagger only uses local features of the input to score tag pairs, they try to capture all the context with the graph with additional context features on types.
CRF is mentioned in 32 sentences in this paper.
Topics mentioned in this paper:
Dethlefs, Nina and Hastie, Helen and Cuayáhuitl, Heriberto and Lemon, Oliver
Cohesion across Utterances
This paper focuses on surface realisation from these trees using a CRF as shown in the surface realisation module.
Cohesion across Utterances
As shown in the architecture diagram in Figure 1, a CRF surface realiser takes a semantic tree as input.
Cohesion across Utterances
Figure 2: (a) Graphical representation of a linear-chain Conditional Random Field (CRF), where empty nodes correspond to the labelled sequence, shaded nodes to linguistic observations, and dark squares to feature functions between states and observations; (b) Example semantic trees that are updated at each time step in order to provide linguistic features to the CRF (only one possible surface realisation is shown and parse categories are omitted for brevity); (c) Finite state machine of phrases (labels) for this example.
Introduction
Our main hypothesis is that the use of global context in a CRF with semantic trees can lead to surface realisations that are better phrased, more natural and less repetitive than taking only local features into account.
CRF is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Bilingual NER by Agreement
The English-side CRF model assigns the following probability for a tag sequence ye:
Bilingual NER by Agreement
where V6 is the set of vertices in the CRF and De is the set of edges.
Bilingual NER by Agreement
2mm) and w(vi,vj) are the node and edge clique potentials, and Ze(e) is the partition function for input sequence e under the English CRF model.
Experimental Setup
We train the two CRF models on all portions of the OntoNotes corpus that are annotated with named entity tags, except the parallel-aligned portion which we reserve for development and test purposes.
Experimental Setup
1The exact feature set and the CRF implementation
CRF is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Experiment
To benchmark the improvement that the factorial CRF model has by doing the two tasks jointly, we compare with a LCRF solution that chains these two tasks together.
Experiment
We note that the CRF based models achieve much higher precision score than baseline systems, which means that the CRF based models can make accurate predictions without enlarging the scope of prospective informal words.
Experiment
Compared with the CRF based models, the SVM and DT both over-predict informal words, incurring a larger precision penalty.
Introduction
Our techniques significantly outperform both research and commercial state-of-the-art for these problems, including two-step linear CRF baselines which perform the two tasks sequentially.
Methodology
2.2.1 Linear-Chain CRF
Methodology
A linear-chain CRF (LCRF; Figure 2a) predicts the output label based on feature functions provided by the scientist on the input.
Methodology
2.2.2 Factorial CRF
Related Work
This is a weakness as their linear CRF model requires retraining.
CRF is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Experiments
Each extracts opinion entities first using the same CRF employed in our approach, and then predicts opinion relations on the opinion entity candidates obtained from the CRF prediction.
Experiments
We report results using opinion entity candidates from the best CRF output and from the merged lO-best CRF output.10 The motivation of merging the lO-best output is to increase recall for the pipeline methods.
Model
1We randomly split the training data into 10 parts and obtained the 50-best CRF predictions on each part for the generation of candidates.
Model
We also experimented with candidates generated from more CRF predictions, but did not find any performance improvement for the task.
Results
We compare our approach with the pipeline baselines and CRF (the first step of the pipeline).
Results
by adding the relation extraction step, the pipeline baselines are able to improve precision over the CRF but fail at recall.
Results
CRF+Syn and CRF+Adj provide the same performance as CRF , since the relation extraction step only affects the results of opinion arguments.
CRF is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Rokhlenko, Oleg and Szpektor, Idan
Comparable Question Mining
Therefore, we decided to employ a Conditional Random Fields ( CRF ) tag-
Comparable Question Mining
ger (Lafferty et al., 2001) to the task, since CRF was shown to be state-of-the-art for sequential relation extraction (Mooney and Bunescu, 2005; Culotta et al., 2006; J indal and Liu, 2006).
Comparable Question Mining
This transformation helps us to design a simpler CRF than that of (Jindal and Liu, 2006), since our CRF utilizes the known positions of the target entities in the text.
Related Work
Our extraction of comparable relations falls within the field of Relation Extraction, in which CRF is a state-of-the-art method (Mooney and Bunescu, 2005; Culotta et al., 2006).
CRF is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Radziszewski, Adam
CRF and features
The choice of CRF for sequence labelling was mainly influenced by its successful application to chunking of Polish (Radziszewski and Pawlaczek, 2012).
Evaluation
The performed evaluation assumed training of the CRF on the whole development set annotated with the induced transformations and then applying the trained model to tag the evaluation part with transformations.
Evaluation
We decided to implement both baseline algorithms using the same CRF model but trained on fabricated data.
Evaluation
Also, it turns out that the variation of the matching procedure using the ‘lem’ transformation (row labelled CRF lem) performs slightly worse than the procedure without this transformation (row CRF nolem).
Introduction
In this paper we present a novel approach to noun phrase lemmatisation where the main phase is cast as a tagging problem and tackled using a method devised for such problems, namely Conditional Random Fields ( CRF ).
Phrase lemmatisation as a tagging problem
Our idea is simple: by expressing phrase lemmatisation in terms of word-level transformations we can reduce the task to tagging problem and apply well known Machine Learning techniques that have been devised for solving such problems (e. g. CRF ).
CRF is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine
Previous work
For case prediction, they trained a CRF with access to lemmas and POS-tags within a given window.
Previous work
While the CRF used for case prediction in Fraser et al.
Translation pipeline
Morphological features are predicted on four separate CRF models, one for each feature.
Translation pipeline
Instead, we limit the power of the CRF model through experimenting with the removal of features, until we had a system that was robust to this problem.
Using subcategorization information
subject, object or modifier) are passed on to the system at the level of nouns and integrated into the CRF through the derived probabilities.
Using subcategorization information
The classification task of the CRF consists in predicting a sequence of labels: case values for NPsflDPs or no value otherwise, cf.
Using subcategorization information
In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple,
CRF is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao
Proposed Method
We use a sequence discrimination technique based on CRF (Lafferty et al., 2001) to identify the label sequence that corresponds to the NP.
Proposed Method
There are two differences between our task and the CRF task.
Proposed Method
One difference is that CRF discriminates label sequences that consist of labels from all of the label candidates, whereas we constrain the label sequences to sequences where the label at the CP is C, the label at an NPC is N, and the labels between the CP and the NPC are I.
CRF is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Darwish, Kareem
Baseline Arabic NER System
For the baseline system, we used the CRF++1 implementation of CRF sequence labeling with default parameters.
Cross-lingual Features
translation was capitalized or not respectively; and the weights were binned because CRF++ only takes nominal features.
Introduction
Conditional Random Fields ( CRF )) can often identify such indicative words.
Related Work
Benajiba and Rosso (2008) used CRF sequence labeling and incorporated many language specific features, namely POS tagging, base-phrase chunking, Arabic tokenization, and adjectives indicating nationality.
Related Work
(2008), they examined the same feature set on the Automatic Content Extraction (ACE) datasets using CRF
Related Work
The use of CRF sequence labeling for NER has shown success (McCallum and Li, 2003; Nadeau and Sekine, 2009; Benajiba and Rosso, 2008).
CRF is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Joty, Shafiq and Carenini, Giuseppe and Ng, Raymond and Mehdad, Yashar
Conclusion
In this paper, we have presented a novel discourse parser that applies an optimal parsing algorithm to probabilities inferred from two CRF models: one for intra-sentential parsing and the other for multi-sentential parsing.
Introduction
The CRF models effectively represent the structure and the label of a DT constituent jointly, and whenever possible, capture the sequential dependencies between the constituents.
Introduction
To cope with this limitation, our CRF models support a probabilistic bottom-up parsing
Parsing Models and Parsing Algorithm
Figure 62 A CRF as a multi—sentential parsing model.
Parsing Models and Parsing Algorithm
It becomes a CRF if we directly model the hidden (output) variables by conditioning its clique potential (or factor) gb on the observed (input) variables:
CRF is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yao, Xuchen and Van Durme, Benjamin and Clark, Peter
Experiments
6This is because the weights of unigram to trigram features in a loglinear CRF model is a balanced consequence for maximization.
Introduction
The QA system employs a linear chain Conditional Random Field ( CRF ) (Lafferty et al., 2001) and tags each token as either an answer (ANS) or not (0).
Introduction
With weights optimized by CRF training (Table 1), we can learn how answer features are correlated with question features.
Introduction
These features, whose weights are optimized by the CRF training, directly reflect what the most important answer types associated with each question type are.
CRF is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Scheible, Christian and Schütze, Hinrich
Distant Supervision
Following this assumption, we train MaXEnt and Conditional Random Field ( CRF , (McCallum, 2002)) classifiers on the [6% of documents that have the lowest maximum flow values f, where k is a parameter which we optimize using the run count method introduced in Section 4.
Distant Supervision
Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM.
Distant Supervision
quence model, we train a CRF (line 6); however, the improvement over MaxEnt (line 4) is not significant.
CRF is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Experimental Setup
The dataset from Clarke and Lapata (2008) is used to train the CRF and MaxEnt classifiers (Section 4).
Sentence Compression
The CRF model is built using the features shown in Table 3.
Sentence Compression
During inference, we find the maximally likely sequence Y according to a CRF with parameter 6 (Y = arg maXy/ P(Y’|X;6)), while simultaneously enforcing the rules of Table 2 to reduce the hypothesis space and encourage grammatical compression.
CRF is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: