Introduction | It initially starts with training supervised Conditional Random Fields ( CRF ) (Lafferty et al., 2001) on the source training data which has been semantically tagged. |
Introduction | Our SSL uses MTR to smooth the semantic tag posteriors on the unlabeled target data (decoded using the CRF model) and later obtains the best tag sequences. |
Introduction | While retrospective learning itera-tively trains CRF models with the automatically annotated target data (explained above), it keeps track of the errors of the previous iterations so as to carry the properties of both the source and target domains. |
Markov Topic Regression - MTR | We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features. |
Related Work and Motivation | We present a retrospective SSL for CRF , in that, the iterative learner keeps track of the errors of the previous iterations so as to carry the properties of both the source and target domains. |
Semi-Supervised Semantic Labeling | 4.1 Semi Supervised Learning (SSL) with CRF |
Semi-Supervised Semantic Labeling | They decode unlabeled queries from target domain (t) using a CRF model trained on the POS-labeled newswire data (source domain (0)). |
Semi-Supervised Semantic Labeling | Since CRF tagger only uses local features of the input to score tag pairs, they try to capture all the context with the graph with additional context features on types. |
Cohesion across Utterances | This paper focuses on surface realisation from these trees using a CRF as shown in the surface realisation module. |
Cohesion across Utterances | As shown in the architecture diagram in Figure 1, a CRF surface realiser takes a semantic tree as input. |
Cohesion across Utterances | Figure 2: (a) Graphical representation of a linear-chain Conditional Random Field (CRF), where empty nodes correspond to the labelled sequence, shaded nodes to linguistic observations, and dark squares to feature functions between states and observations; (b) Example semantic trees that are updated at each time step in order to provide linguistic features to the CRF (only one possible surface realisation is shown and parse categories are omitted for brevity); (c) Finite state machine of phrases (labels) for this example. |
Introduction | Our main hypothesis is that the use of global context in a CRF with semantic trees can lead to surface realisations that are better phrased, more natural and less repetitive than taking only local features into account. |
Bilingual NER by Agreement | The English-side CRF model assigns the following probability for a tag sequence ye: |
Bilingual NER by Agreement | where V6 is the set of vertices in the CRF and De is the set of edges. |
Bilingual NER by Agreement | 2mm) and w(vi,vj) are the node and edge clique potentials, and Ze(e) is the partition function for input sequence e under the English CRF model. |
Experimental Setup | We train the two CRF models on all portions of the OntoNotes corpus that are annotated with named entity tags, except the parallel-aligned portion which we reserve for development and test purposes. |
Experimental Setup | 1The exact feature set and the CRF implementation |
Experiment | To benchmark the improvement that the factorial CRF model has by doing the two tasks jointly, we compare with a LCRF solution that chains these two tasks together. |
Experiment | We note that the CRF based models achieve much higher precision score than baseline systems, which means that the CRF based models can make accurate predictions without enlarging the scope of prospective informal words. |
Experiment | Compared with the CRF based models, the SVM and DT both over-predict informal words, incurring a larger precision penalty. |
Introduction | Our techniques significantly outperform both research and commercial state-of-the-art for these problems, including two-step linear CRF baselines which perform the two tasks sequentially. |
Methodology | 2.2.1 Linear-Chain CRF |
Methodology | A linear-chain CRF (LCRF; Figure 2a) predicts the output label based on feature functions provided by the scientist on the input. |
Methodology | 2.2.2 Factorial CRF |
Related Work | This is a weakness as their linear CRF model requires retraining. |
Experiments | Each extracts opinion entities first using the same CRF employed in our approach, and then predicts opinion relations on the opinion entity candidates obtained from the CRF prediction. |
Experiments | We report results using opinion entity candidates from the best CRF output and from the merged lO-best CRF output.10 The motivation of merging the lO-best output is to increase recall for the pipeline methods. |
Model | 1We randomly split the training data into 10 parts and obtained the 50-best CRF predictions on each part for the generation of candidates. |
Model | We also experimented with candidates generated from more CRF predictions, but did not find any performance improvement for the task. |
Results | We compare our approach with the pipeline baselines and CRF (the first step of the pipeline). |
Results | by adding the relation extraction step, the pipeline baselines are able to improve precision over the CRF but fail at recall. |
Results | CRF+Syn and CRF+Adj provide the same performance as CRF , since the relation extraction step only affects the results of opinion arguments. |
Comparable Question Mining | Therefore, we decided to employ a Conditional Random Fields ( CRF ) tag- |
Comparable Question Mining | ger (Lafferty et al., 2001) to the task, since CRF was shown to be state-of-the-art for sequential relation extraction (Mooney and Bunescu, 2005; Culotta et al., 2006; J indal and Liu, 2006). |
Comparable Question Mining | This transformation helps us to design a simpler CRF than that of (Jindal and Liu, 2006), since our CRF utilizes the known positions of the target entities in the text. |
Related Work | Our extraction of comparable relations falls within the field of Relation Extraction, in which CRF is a state-of-the-art method (Mooney and Bunescu, 2005; Culotta et al., 2006). |
CRF and features | The choice of CRF for sequence labelling was mainly influenced by its successful application to chunking of Polish (Radziszewski and Pawlaczek, 2012). |
Evaluation | The performed evaluation assumed training of the CRF on the whole development set annotated with the induced transformations and then applying the trained model to tag the evaluation part with transformations. |
Evaluation | We decided to implement both baseline algorithms using the same CRF model but trained on fabricated data. |
Evaluation | Also, it turns out that the variation of the matching procedure using the ‘lem’ transformation (row labelled CRF lem) performs slightly worse than the procedure without this transformation (row CRF nolem). |
Introduction | In this paper we present a novel approach to noun phrase lemmatisation where the main phase is cast as a tagging problem and tackled using a method devised for such problems, namely Conditional Random Fields ( CRF ). |
Phrase lemmatisation as a tagging problem | Our idea is simple: by expressing phrase lemmatisation in terms of word-level transformations we can reduce the task to tagging problem and apply well known Machine Learning techniques that have been devised for solving such problems (e. g. CRF ). |
Previous work | For case prediction, they trained a CRF with access to lemmas and POS-tags within a given window. |
Previous work | While the CRF used for case prediction in Fraser et al. |
Translation pipeline | Morphological features are predicted on four separate CRF models, one for each feature. |
Translation pipeline | Instead, we limit the power of the CRF model through experimenting with the removal of features, until we had a system that was robust to this problem. |
Using subcategorization information | subject, object or modifier) are passed on to the system at the level of nouns and integrated into the CRF through the derived probabilities. |
Using subcategorization information | The classification task of the CRF consists in predicting a sequence of labels: case values for NPsflDPs or no value otherwise, cf. |
Using subcategorization information | In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple, |
Proposed Method | We use a sequence discrimination technique based on CRF (Lafferty et al., 2001) to identify the label sequence that corresponds to the NP. |
Proposed Method | There are two differences between our task and the CRF task. |
Proposed Method | One difference is that CRF discriminates label sequences that consist of labels from all of the label candidates, whereas we constrain the label sequences to sequences where the label at the CP is C, the label at an NPC is N, and the labels between the CP and the NPC are I. |
Baseline Arabic NER System | For the baseline system, we used the CRF++1 implementation of CRF sequence labeling with default parameters. |
Cross-lingual Features | translation was capitalized or not respectively; and the weights were binned because CRF++ only takes nominal features. |
Introduction | Conditional Random Fields ( CRF )) can often identify such indicative words. |
Related Work | Benajiba and Rosso (2008) used CRF sequence labeling and incorporated many language specific features, namely POS tagging, base-phrase chunking, Arabic tokenization, and adjectives indicating nationality. |
Related Work | (2008), they examined the same feature set on the Automatic Content Extraction (ACE) datasets using CRF |
Related Work | The use of CRF sequence labeling for NER has shown success (McCallum and Li, 2003; Nadeau and Sekine, 2009; Benajiba and Rosso, 2008). |
Conclusion | In this paper, we have presented a novel discourse parser that applies an optimal parsing algorithm to probabilities inferred from two CRF models: one for intra-sentential parsing and the other for multi-sentential parsing. |
Introduction | The CRF models effectively represent the structure and the label of a DT constituent jointly, and whenever possible, capture the sequential dependencies between the constituents. |
Introduction | To cope with this limitation, our CRF models support a probabilistic bottom-up parsing |
Parsing Models and Parsing Algorithm | Figure 62 A CRF as a multi—sentential parsing model. |
Parsing Models and Parsing Algorithm | It becomes a CRF if we directly model the hidden (output) variables by conditioning its clique potential (or factor) gb on the observed (input) variables: |
Experiments | 6This is because the weights of unigram to trigram features in a loglinear CRF model is a balanced consequence for maximization. |
Introduction | The QA system employs a linear chain Conditional Random Field ( CRF ) (Lafferty et al., 2001) and tags each token as either an answer (ANS) or not (0). |
Introduction | With weights optimized by CRF training (Table 1), we can learn how answer features are correlated with question features. |
Introduction | These features, whose weights are optimized by the CRF training, directly reflect what the most important answer types associated with each question type are. |
Distant Supervision | Following this assumption, we train MaXEnt and Conditional Random Field ( CRF , (McCallum, 2002)) classifiers on the [6% of documents that have the lowest maximum flow values f, where k is a parameter which we optimize using the run count method introduced in Section 4. |
Distant Supervision | Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM. |
Distant Supervision | quence model, we train a CRF (line 6); however, the improvement over MaxEnt (line 4) is not significant. |
Experimental Setup | The dataset from Clarke and Lapata (2008) is used to train the CRF and MaxEnt classifiers (Section 4). |
Sentence Compression | The CRF model is built using the features shown in Table 3. |
Sentence Compression | During inference, we find the maximally likely sequence Y according to a CRF with parameter 6 (Y = arg maXy/ P(Y’|X;6)), while simultaneously enforcing the rules of Table 2 to reduce the hypothesis space and encourage grammatical compression. |