Introduction | §2 introduces the maximum entropy (maxent) and conditional random field ( CRF ) learning techniques employed, along with specifications for the design and training of our hierarchical prior. |
Investigation | Specifically, we compared our approximate hierarchical prior model (HIER), implemented as a CRF, against three baselines: o GAUSS: CRF model tuned on a single domain’s data, using a standard N(0, 1) prior 0 CAT: CRF model tuned on a concatenation of multiple domains’ data, using a N(0, 1) prior 0 CHELBA: CRF model tuned on one domain’s data, using a prior trained on a different, related domain’s data (cf. |
Investigation | Line a shows the F1 performance of a CRF model tuned only on the target MUC6 domain (GAUSS) across a range of tuning data sizes. |
Investigation | Line I) shows the same experiment, but this time the CRF model has been tuned on a dataset comprised of a simple concatenation of the training MUC6 data from (a), along with a different training set from MUC7 (CAT). |
Models considered 2.1 Basic Conditional Random Fields | The parametric form of the CRF for a sentence of length n is given as follows: |
Models considered 2.1 Basic Conditional Random Fields | CRF learns a model consisting of a set of weights A = {A1...)\F} over the features so as to maximize the conditional likelihood of the training data, p(lémm|Xtmm), given the model p A. |
Models considered 2.1 Basic Conditional Random Fields | 2.2 CRF with Gaussian priors |
Context and Answer Detection | Finally, we will briefly introduce CRF models and the features that we used for CRF model. |
Context and Answer Detection | A CRF is an undirected graphical model G of the conditional distribution P(Y|X). |
Context and Answer Detection | Linear CRF model has been successfully applied in NLP and text mining tasks (McCallum and Li, 2003; Sha and Pereira, 2003). |
Introduction | To capture the dependency between contexts and answers, we introduce Skip-chain CRF model for answer detection. |
Introduction | Experimental results show that 1) Linear CRFs outperform SVM and decision tree in both context and answer detection; 2) Skip-chain CRFs outperform Linear CRFs for answer finding, which demonstrates that context improves answer finding; 3) 2D CRF model improves the performance of Linear CRFs and the combination of 2D CRFs and Skip-chain CRFs achieves better performance for context detection. |
Hybrid Relation Extraction | Due to the sequential nature of our RE task, H-CRF employs a CRF as the meta-leamer, as opposed to a decision tree or regression-based classifier. |
Hybrid Relation Extraction | To obtain the probability at each position of a linear-chain CRF , the constrained forward-backward technique described in (Culotta and McCallum, 2004) is used. |
Related Work | (2006) used a CRF for RE, yet their task differs greatly from open extraction. |
Relation Extraction | Figure 1: Relation Extraction as Sequence Labeling: A CRF is used to identify the relationship, born in, between Kafka and Prague |
Relation Extraction | The resulting set of labeled examples are described using features that can be extracted without syntactic or semantic analysis and used to train a CRF , a sequence model that learns to identify spans of tokens believed to indicate explicit mentions of relationships between entities. |
Relation Extraction | The entity pair serves to anchor each end of a linear-chain CRF , and both entities in the pair are assigned a fixed label of ENT. |
Introduction | For example, in (Lafferty et al., 2001), when switching from a generatively trained hidden Markov model (HMM) to a discriminatively trained, linear chain, conditional random field ( CRF ) for part-of-speech tagging, their error drops from 5.7% to 5.6%. |
Introduction | When they add in only a small set of orthographic features, their CRF error rate drops considerably more to 4.3%, and their out-of-vocabulary error rate drops by more than half. |
The Model | We then define a conditional probability distribution over entire trees, using the standard CRF distribution, shown in (1). |
Experiments | We generated the node and the edge features of a CRF model as described in Table 3 using these atomic features. |
Experiments | To train CRF models, we used Taku Kudo’s CRF++ (ver. |
Using Gazetteers as Features of NER | These annotated IOB tags can be used in the same way as other features in a CRF tagger. |