Experimental Comparison with Unsupervised Learning | We compare GE and supervised training of an edge-factored CRF with unsupervised learning of a DMV model (Klein and Manning, 2004) using EM and contrastive estimation (CE) (Smith and Eisner, 2005). |
Experimental Comparison with Unsupervised Learning | We note that there are considerable differences between the DMV and CRF models. |
Experimental Comparison with Unsupervised Learning | The DMV model is more expressive than the CRF because it can model the arity of a head as well as sibling relationships. |
Generalized Expectation Criteria | In the following sections we apply GE to non-projective CRF dependency parsing. |
Generalized Expectation Criteria | We first consider an arbitrarily structured conditional random field (Lafferty et al., 2001) p)‘ (y We describe the CRF for non-projective dependency parsing in Section 3.2. |
Generalized Expectation Criteria | We now define a CRF p)‘ (y|x) for unlabeled, non-projective5 dependency parsing. |
Introduction | With GE we may add a term to the objective function that encourages a feature-rich CRF to match this expectation on unlabeled data, and in the process learn about related features. |
Introduction | In this paper we use a non-projective dependency tree CRF (Smith and Smith, 2007). |
Experiments | We investigate the use of smoothing in two test systems, conditional random field ( CRF ) models for POS tagging and chunking. |
Experiments | Finally, we train the CRF model on the annotated training set and apply it to the test set. |
Experiments | We use an open source CRF software package designed by Sunita Sajarwal and William W. Cohen to implement our CRF models.1 We use a set of boolean features listed in Table 1. |
Named Entity Recognition | Conditional Random Fields ( CRF ) (Lafferty et. |
Named Entity Recognition | We employed a linear chain CRF with L2 regularization as the baseline algorithm to which we added phrase cluster features. |
Named Entity Recognition | The features in our baseline CRF classifier are a subset of the conventional features. |
Abbreviator with Nonlocal Information | The DPLVM is a natural extension of the CRF model (see Figure 2), which is a special case of the DPLVM, with only one latent variable assigned for each label. |
Introduction | Figure 2: CRF vs. DPLVM. |
Recognition as a Generation Task | As can be seen in Table 6, using the latent variables significantly improved the performance (see DPLVM vs. CRF), and using the GI encoding improved the performance of both the DPLVM and the CRF . |
Recognition as a Generation Task | Table 7 shows that the back-off method further improved the performance of both the DPLVM and the CRF model. |
Results and Discussion | special case of the DPLVM is exactly the CRF (see Section 2.1), this case is hereinafter denoted as the CRF . |
Results and Discussion | The results revealed that the latent variable model significantly improved the performance over the CRF model. |
Results and Discussion | All of its top-l, top-2, and top-3 accuracies were consistently better than those of the CRF model. |
Evaluation | As seen in this plot, when there is a small amount of training data, the parser performs better than the CRF module and parser+SVM module performs better than the other two. |
Evaluation | With a large amount of training data, the CRF and parser almost have the same performance. |
Evaluation | 4 The CRF module also uses the lexical resources and regular expressions. |
Introduction | MEMM and CRF are discriminative models; hence they are highly dependent on the training data. |
Summary | Test N0 = 3000 P R F Q CRF 0.815 0.812 0.813 0.509 Parser 0.808 0.814 0.811 0.494 |
Log-Linear Models | If the structure is a sequence, the model is called a linear-chain CRF model, and the marginal probabilities of the features and the partition function can be efficiently computed by using the forward-backward algorithm. |
Log-Linear Models | If the structure is a tree, the model is called a tree CRF model, and the marginal probabilities can be computed by using the inside-outside algorithm. |
Log-Linear Models | We evaluate the effectiveness our training algorithm using linear-chain CRF models and three NLP tasks: text chunking, named entity recognition, and POS tagging. |