Experimental Comparison with Unsupervised Learning | Figure 2: Comparison of GE training of the restricted and full CRFs with unsupervised learning of DMV. |
Generalized Expectation Criteria | GE has been applied to logistic regression models (Mann and McCallum, 2007; Druck et al., 2008) and linear chain CRFs (Mann and McCallum, 2008). |
Generalized Expectation Criteria | 3.1 GE in General CRFs |
Generalized Expectation Criteria | 3.2 Non-Projective Dependency Tree CRFs |
Introduction | Generalized expectation (GE) (Mann and McCallum, 2008; Druck et al., 2008) is a recently proposed framework for incorporating prior knowledge into the learning of conditional random fields ( CRFs ) (Lafferty et al., 2001). |
The Chunking-based Segmentation for Chinese ONs | 4.3 The CRFs Model for Chunking |
The Chunking-based Segmentation for Chinese ONs | Considered as a discriminative probabilistic model for sequence joint labeling and with the advantage of flexible feature fusion ability, Conditional Random Fields ( CRFs ) [J .Lafferty et al., 2001] is believed to be one of the best probabilistic models for sequence labeling tasks. |
The Chunking-based Segmentation for Chinese ONs | So the CRFs model is employed for chunking. |
The Framework of Our System | CRFs Chunking Mode] |
Introduction | We evaluate the effectiveness of our method by using linear-chain conditional random fields ( CRFs ) and three traditional NLP tasks, namely, text chunking (shallow parsing), named entity recognition, and POS tagging. |
Log-Linear Models | The CRF++ version 0.50, a popular CRF library developed by Taku Kudo,6 is reported to take 4,021 seconds on Xeon 3.0GHz processors to train the model using a richer feature set.7 CRFsuite version 0.4, a much faster library for CRFs , is reported to take 382 seconds on Xeon 3.0GHz, using the same feature set as ours.8 Their library uses the OWL-QN algorithm for optimization. |
Log-Linear Models | (2006) report an f-score of 71.48 on the same data, using semi-Markov CRFs . |
Log-Linear Models | We have conducted experiments using CRFs and three NLP tasks, and demonstrated empirically that our training algorithm can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for L1-regularization. |
Introduction | The most widely used approaches to these problems have been sequential models including hidden Markov models (HMMS), maximum entropy Markov models (MEMMS) (Mccallum 2000), and conditional random fields ( CRFS ) (Lafferty et al. |
Introduction | Because of this limitation, Viola and Narasimhand (2007) use a discriminative context-free (phrase structure) grammar for extracting information from semistructured data and report higher performances over CRFs . |
Introduction | Contextual information often plays a big role in resolving tagging ambiguities and is one of the key benefits of discriminative models such as CRFs . |