Index of papers in Proc. ACL that mention
  • CRFs
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Abstract
We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the treebank data (labeled), on the bilingual data (unlabeled).
Introduction
Crucially, the GP expression with the bilingual knowledge is then used as side information to regularize a CRFs (conditional random fields) model’s learning over treebank and bitext data, based on the posterior regularization (PR) framework (Ganchev et al., 2010).
Introduction
This constrained learning amounts to a jointly coupling of GP and CRFs , i.e., integrating GP into the estimation of a parametric structural model.
Methodology
language “Di and “Di: Ensure: 6: the CRFs model parameters : DCHf <— char_align_bitext (133,135) 7“ <— learn_word_bound (“Ba—’10) : Q <— encode_graph_constraint ( $133, 7“) : 6 <— pr_crf_graph (13$?wa Q)
Methodology
The GP expression will be defined as a PR constraint in Section 3.3 that reflects the interactions between the graph and the CRFs model.
Methodology
Supervised linear-chain CRFs can be modeled in a standard conditional log-likelihood objective with a Gaussian prior:
Related Work
(2008) enhanced a CRFs segmentation model in MT tasks by tuning the word granularity and improving the segmentation consistence.
Related Work
Rather than playing the “hard” uses of the bilingual segmentation knowledge, i.e., directly merging “char-to-word” alignments to words as supervisions, this study extracts word boundary information of characters from the alignments as soft constraints to regularize a CRFs model’s learning.
Related Work
(2014), proposed GP for inferring the label information of unlabeled data, and then leverage these GP outcomes to learn a semi-supervised scalable model (e. g., CRFs ).
CRFs is mentioned in 30 sentences in this paper.
Topics mentioned in this paper:
Nomoto, Tadashi
A Sentence Trimmer with CRFs
Our idea on how to make CRFs comply with grammar is quite simple: we focus on only those label sequences that are associated with grammatically correct compressions, by making CRFs look at only those that comply with some grammatical constraints G, and ignore others, regardless of how probable they are.1 But how do we find compressions that are grammatical?
A Sentence Trimmer with CRFs
1Assume as usual that CRFs take the form, p(le) 0< eX10 21m- )‘jfj(yk7yk—17X)+ Z,- Migi($k7ykax))
A Sentence Trimmer with CRFs
(1) fj and g, are ‘features’ associated with edges and vertices, respectively, and k: E C, where C denotes a set of cliques in CRFs .
Abstract
The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields ( CRFs ), to address improving the grammaticality of compression while retaining its relevance.
Features in CRFs
We use an array of features in CRFs which are either derived or borrowed from the taxonomy that a Japanese tokenizer called JUMAN and KNP,6 a Japanese dependency parser (aka Kurohashi-Nagao Parser), make use of in characterizing the output they produce: both JUMAN and KNP are part of the compression model we build.
Introduction
What sets this work apart from them, however, is a novel use we make of Conditional Random Fields ( CRFs ) to select among possible compressions (Lafferty et al., 2001; Sutton and McCallum, 2006).
Introduction
An obvious benefit of using CRFs for sentence compression is that the model provides a general (and principled) probabilistic framework which permits information from various sources to be integrated towards compressing sentence, a property K&M do not share.
Introduction
Nonetheless, there is some cost that comes with the straightforward use of CRFs as a discriminative classifier in sentence compression; its outputs are often ungrammatical and it allows no control over the length of compression they generates (Nomoto, 2007).
CRFs is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Feng, Vanessa Wei and Hirst, Graeme
Abstract
Our model adopts a greedy bottom-up approach, with two linear-chain CRFs applied in cascade as local classifiers.
Bottom-up tree-building
4.1 Linear-chain CRFs as Local models
Bottom-up tree-building
While our bottom-up tree-building shares the greedy framework with HILDA, unlike HILDA, our local models are implemented using CRFs .
Bottom-up tree-building
Therefore, our model incorporates the strengths of both HILDA and Joty et al.’s model, i.e., the efficiency of a greedy parsing algorithm, and the ability to incorporate sequential information with CRFs .
Experiments
For local models, our structure models are trained using MALLET (McCallum, 2002) to include constraints over transitions between adjacent labels, and our relation models are trained using CRFSuite (Okazaki, 2007), which is a fast implementation of linear-chain CRFs .
Introduction
DCRF (Dynamic Conditional Random Fields) is a generalization of linear-chain CRFs , in which each time slice contains a set of state variables and edges (Sutton et al., 2007).
Introduction
Second, by using two linear-chain CRFs to label a sequence of discourse constituents, we can incorporate contextual information in a more natural way, compared to using traditional discriminative classifiers, such as SVMs.
Post-editing
is almost identical to their counterparts of the bottom-up tree-building, except that the linear-chain CRFs in post-editing includes additional features to represent information from constituents on higher levels (to be introduced in Section 7).
CRFs is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Abstract
The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields ( CRFs ) on unlabeled data.
Abstract
Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task.
Background
The first-order CRFs model (Lafferty et al., 2001) has been the most common one in this task.
Background
The goal is to learn a CRFs model in the form,
Introduction
The derived label distributions are regarded as prior knowledge to regularize the learning of a sequential model, conditional random fields ( CRFs ) in this case, on both
Introduction
Section 3 reviews the background, including supervised character-based joint S&T model based on CRFs and graph-based label propagation.
Related Work
Sun and Xu (2011) enhanced a CWS model by interpolating statistical features of unlabeled data into the CRFs model.
Related Work
also differs from other semi-supervised CRFs algorithms.
Related Work
(2006), extended by Mann and McCallum (2007), reported a semi-supervised CRFs model which aims to guide the learning by minimizing the conditional entropy of unlabeled data.
CRFs is mentioned in 30 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Experiments
In this section, we describe the baseline setup, the CRFs training results, the RNN training results
Experiments
0 Wapiti toolkit (Lavergne et al., 2010) used for CRFs ; RNN is built by the RNNLIB toolkit.
Experiments
5.2 CRFs Training Results
Tagging-style Reordering Model
For this supervised learning task, we choose the approach conditional random fields ( CRFs ) (Lafferty et al., 2001; Sutton and Mccallum, 2006; Lavergne et al., 2010) and recurrent neural network (RNN) (Elman, 1990; Jordan, 1990; Lang et al., 1990).
Tagging-style Reordering Model
For the first method, we adopt the linear-chain CRFs .
Tagging-style Reordering Model
However, even for the simple linear-chain CRFs , the complexity of learning and inference grows quadratically with respect to the number of output labels and the amount of structural features which are with regard to adjacent pairs of labels.
CRFs is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Lavergne, Thomas and Cappé, Olivier and Yvon, François
Abstract
Conditional Random Fields ( CRFs ) are a widely-used approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels.
Abstract
In this paper, we address the issue of training very large CRFs , containing up to hundreds output labels and several billion features.
Abstract
Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy, while delivering compact parameter sets.
Introduction
Conditional Random Fields ( CRFs ) (Lafferty et al., 2001; Sutton and McCallum, 2006) constitute a widely-used and effective approach for supervised structure learning tasks involving the mapping between complex objects such as strings and trees.
Introduction
An important property of CRFs is their ability to handle large and redundant feature sets and to integrate structural dependency between output labels.
Introduction
However, even for simple linear chain CRFs , the complexity of learning and inference Was partly supported by ANR projects CroTaL
CRFs is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Ding, Shilin and Cong, Gao and Lin, Chin-Yew and Zhu, Xiaoyan
Abstract
In this paper, we propose a general framework based on Conditional Random Fields ( CRFs ) to detect the contexts and answers of questions from forum threads.
Abstract
We improve the basic framework by Skip—chain CRFs and 2D CRFs to better accommodate the features of forums for better performance.
Context and Answer Detection
We first discuss using Linear CRFs for context and answer detection, and then extend the basic framework to Skip-chain CRFs and 2D CRFs to better model our problem.
Context and Answer Detection
3.1 Using Linear CRFs
Context and Answer Detection
For ease of presentation, we focus on detecting contexts using Linear CRFs .
Introduction
First, we employ Linear Conditional Random Fields ( CRFs ) to identify contexts and answers, which can capture the relationships between contiguous sentences.
Introduction
We also extend the basic model to 2D CRFs to model dependency between contiguous questions in a forum thread for context and answer identification.
Introduction
Experimental results show that 1) Linear CRFs outperform SVM and decision tree in both context and answer detection; 2) Skip-chain CRFs outperform Linear CRFs for answer finding, which demonstrates that context improves answer finding; 3) 2D CRF model improves the performance of Linear CRFs and the combination of 2D CRFs and Skip-chain CRFs achieves better performance for context detection.
CRFs is mentioned in 45 sentences in this paper.
Topics mentioned in this paper:
Qu, Zhonghua and Liu, Yang
Introduction
Because multiple runs of separate linear-chain CRFs ignore the dependency between source sentences, the second approach we propose is to use a 2D CRF that models all pair relationships jointly.
Introduction
Our experimental results show that our proposed sentence type tagging method works very well, even for the minority categories, and that using 2D CRF further improves performance over linear-chain CRFs for identifying dependency relation between sentences.
Introduction
In Section 3, we introduce the use of CRFs for sentence type and dependency tagging.
Related Work
Our study is different in several aspects: we are using forum domains, unlike most work of DA tagging on conversational speech; we use CRFs for sentence type tagging; and more importantly, we also propose to use different CRFs for sentence relation detection.
Thread Structure Tagging
To automatically label sentences in a thread with their types, we adopt a sequence labeling approach, specifically linear-chain conditional random fields ( CRFs ), which have shown good performance in many other tasks (Lafferty, 2001).
Thread Structure Tagging
Linear-chain CRFs is a type of undirected graphical models.
Thread Structure Tagging
CRFs is a special case of undirected graphical model in which w are log-linear functions:
CRFs is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Experiment
The feature templates in (Zhao et al., 2006) and (Zhang and Clark, 2007) are used in training the CRFs model and Perceptrons model, respectively.
Introduction
Sun and Xu (2011) enhanced the segmentation results by interpolating the statistics-based features derived from unlabeled data to a CRFs model.
Segmentation Models
This section briefly reviews two supervised models in these categories, a character-based CRFs model, and a word-based Perceptrons model, which are used in our approach.
Segmentation Models
2.1 Character-based CRFs Model
Segmentation Models
Xue (2003) first proposed the use of CRFs model (Lafferty et al., 2001) in character-based CWS.
Semi-supervised Learning via Co-regularizing Both Models
The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.
Semi-supervised Learning via Co-regularizing Both Models
Afterwards, the agreements A are used as a set of constraints to bias the learning of CRFs (§ 3.2) and Perceptron (§ 3.3) on the unlabeled data.
Semi-supervised Learning via Co-regularizing Both Models
3.2 CRFs with Constraints
CRFs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Dethlefs, Nina and Hastie, Helen and Cuayáhuitl, Heriberto and Lemon, Oliver
Abstract
We formulate surface realisation as a sequence labelling task and combine the use of conditional random fields ( CRFs ) with semantic trees.
Abstract
Due to their extended notion of context, CRFs are able to take the global utterance context into account and are less constrained by local features than other realisers.
Cohesion across Utterances
This grammar defines the surface realisation space for the CRFs .
Conclusion and Future Directions
We have argued that CRFs are well suited for this task because they are not restricted by independence assumptions.
Conclusion and Future Directions
In addition, we may compare different sequence labelling algorithms for surface realisation (Nguyen and Guo, 2007) or segmented CRFs (Sarawagi and Cohen, 2005) and apply our method to more complex surface realisation domains such as text generation or summarisation.
Evaluation
CRFs and other state-of-the-art methods, we also compare our system to two other baselines:
Incremental Surface Realisation
Since CRFs are not restricted by the Markov condition, they are less constrained by local context than other models and can take nonlocal dependencies into account.
Incremental Surface Realisation
While their extended context awareness can often make CRFs slow to train, they are fast at execution and therefore very applicable to the incremental scenario.
Introduction
to surface realisation within incremental systems, because CRFs are able to model context across full as well as partial generator inputs which may undergo modifications during generation.
Related Work
(2009) who also use CRFs to find the best surface realisation from a semantic tree.
CRFs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Wang, Houfeng and Li, Wenjie
Introduction
While most of the state-of-the-art CWS systems used semi-Markov conditional random fields or latent variable conditional random fields, we simply use a single first-order conditional random fields ( CRFs ) for the joint modeling.
Introduction
The semi-Markov CRFs and latent variable CRFs relax the Markov assumption of CRFs to express more complicated dependencies, and therefore to achieve higher disambiguation power.
Introduction
Alternatively, our plan is not to relax Markov assumption of CRFs , but to exploit more complicated dependencies via using refined high-dimensional features.
System Architecture
3.1 A Joint Model Based on CRFs
System Architecture
First, we briefly review CRFs .
System Architecture
CRFs are proposed as a method for structured classification by solving “the label bias problem” (Lafferty et al., 2001).
CRFs is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin
Introduction
Sequence tagging algorithms including HMMs (Ra-biner, 1989), CRFs (Lafferty et al., 2001), and Collins’s perceptron (Collins, 2002) have been widely employed in NLP applications.
Problem formulation
In this section, we formulate the sequential decoding problem in the context of perceptron algorithm (Collins, 2002) and CRFs (Lafferty et al., 2001).
Problem formulation
and a CRFs model is
Problem formulation
For CRFs , Z (x) is an instance-specific normalization function
Related work
A similar idea was applied to CRFs as well (Cohn, 2006; Jeong, 2009).
CRFs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Mann, Gideon and McCallum, Andrew
Experimental Comparison with Unsupervised Learning
Figure 2: Comparison of GE training of the restricted and full CRFs with unsupervised learning of DMV.
Generalized Expectation Criteria
GE has been applied to logistic regression models (Mann and McCallum, 2007; Druck et al., 2008) and linear chain CRFs (Mann and McCallum, 2008).
Generalized Expectation Criteria
3.1 GE in General CRFs
Generalized Expectation Criteria
3.2 Non-Projective Dependency Tree CRFs
Introduction
Generalized expectation (GE) (Mann and McCallum, 2008; Druck et al., 2008) is a recently proposed framework for incorporating prior knowledge into the learning of conditional random fields ( CRFs ) (Lafferty et al., 2001).
CRFs is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Liu, Kang
The Chunking-based Segmentation for Chinese ONs
4.3 The CRFs Model for Chunking
The Chunking-based Segmentation for Chinese ONs
Considered as a discriminative probabilistic model for sequence joint labeling and with the advantage of flexible feature fusion ability, Conditional Random Fields ( CRFs ) [J .Lafferty et al., 2001] is believed to be one of the best probabilistic models for sequence labeling tasks.
The Chunking-based Segmentation for Chinese ONs
So the CRFs model is employed for chunking.
The Framework of Our System
CRFs Chunking Mode]
CRFs is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew
Background
For this underlying model, we employ a chain-structured conditional random field (CRF), since CRFs have been shown to perform better than other simple unconstrained models like hidden markov models for citation extraction (Peng and McCallum, 2004).
Background
The algorithms we present in later sections for handling soft global constraints and for learning the penalties of these constraints can be applied to general structured linear models, not just CRFs , provided we have an available algorithm for performing MAP inference.
Citation Extraction Data
Later, CRFs were shown to perform better on CORA, improving the results from the Hmm’s token-level F1 of 86.6 to 91.5 with a CRF(Peng and McCallum, 2004).
Citation Extraction Data
This approach is limited in its use of an HMM as an underlying model, as it has been shown that CRFs perform significantly better, achieving 95.37 token-level accuracy on CORA (Peng and McCallum, 2004).
Introduction
Hidden Markov models and linear-chain conditional random fields ( CRFs ) have previously been applied to citation extraction (Hetzner, 2008; Peng and McCallum, 2004) .
CRFs is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Trogkanis, Nikolaos and Elkan, Charles
Conclusions
plication of CRFs , which are a major advance of recent years in machine learning.
Conclusions
A third contribution of our work is a demonstration that current CRF methods can be used straightforwardly for an important application and outperform state-of—the-art commercial and open-source software; we hope that this demonstration accelerates the widespread use of CRFs .
Introduction
Research on structured learning has been highly successful, with sequence classification as its most important and successful subfield, and with conditional random fields ( CRFs ) as the most influential approach to learning sequence classifiers.
Introduction
we show that CRFs can achieve extremely good performance on the hyphenation task.
CRFs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori
Causal Relations for Why-QA
We regard this task as a sequence labeling problem and use Conditional Random Fields ( CRFs ) (Laf-ferty et al., 2001) as a machine learning framework.
Causal Relations for Why-QA
In our task, CRFs take three sentences of a causal relation candidate as input and generate their cause-effect annotations with a set of possible cause-effect IOB labels, including Begin-Cause (BC), Inside-Cause (IC), Begin-Effect (BE), Inside-Effect (IE), and Outside (0).
Causal Relations for Why-QA
We used the three types of feature sets in Table 3 for training the CRFs , where j is in the range of z' — 4 g j g i + 4 for current position i in a causal relation candidate.
CRFs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia
Introduction
We evaluate the effectiveness of our method by using linear-chain conditional random fields ( CRFs ) and three traditional NLP tasks, namely, text chunking (shallow parsing), named entity recognition, and POS tagging.
Log-Linear Models
The CRF++ version 0.50, a popular CRF library developed by Taku Kudo,6 is reported to take 4,021 seconds on Xeon 3.0GHz processors to train the model using a richer feature set.7 CRFsuite version 0.4, a much faster library for CRFs , is reported to take 382 seconds on Xeon 3.0GHz, using the same feature set as ours.8 Their library uses the OWL-QN algorithm for optimization.
Log-Linear Models
(2006) report an f-score of 71.48 on the same data, using semi-Markov CRFs .
Log-Linear Models
We have conducted experiments using CRFs and three NLP tasks, and demonstrated empirically that our training algorithm can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for L1-regularization.
CRFs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kazama, Jun'ichi and Torisawa, Kentaro
Experiments
We think this shows one of the strengths of machine learning methods such as CRFs .
Gazetteer Induction 2.1 Induction by MN Clustering
We expect that tagging models ( CRFs in our case) can learn an appropriate weight for each gazetteer match regardless of whether it is an NE or not.
Related Work and Discussion
Using models such as Semi-Markov CRFs (Sarawagi and Cohen, 2004), which handle the features on overlapping regions, is one possible direction.
Using Gazetteers as Features of NER
The NER task is then treated as a tagging task, which assigns IOB tags to each character in a sentence.10 We use Conditional Random Fields ( CRFs ) (Lafferty et al., 2001) to perform this tagging.
CRFs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Methodology
Given the general performance and discrimi-native framework, Conditional Random Fields ( CRFs ) (Lafferty et al., 2001) is a suitable framework for tackling sequence labeling problems.
Methodology
CRFs represent a basic, simple and well-understood framework for sequence labeling, making it a suitable framework for adapting to perform joint inference.
Methodology
Figure 2: Graphical representations of the two types of CRFs used in this work.
CRFs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Experiments
We trained CRFs for opinion entity identification using the following features: indicators for words, POS tags, and lexicon features (the subjectivity strength of the word in the Subjectivity Lexicon).
Model
We formulate the task of opinion entity identification as a sequence labeling problem and employ conditional random fields ( CRFs ) (Lafferty et al., 2001) to learn the probability of a sequence assignment y for a given sentence x.
Model
We define potential function fig that gives the probability of assigning a span 2' with entity label 2, and the probability is estimated based on the learned parameters from CRFs .
CRFs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Manshadi, Mehdi and Li, Xiao
Introduction
The most widely used approaches to these problems have been sequential models including hidden Markov models (HMMS), maximum entropy Markov models (MEMMS) (Mccallum 2000), and conditional random fields ( CRFS ) (Lafferty et al.
Introduction
Because of this limitation, Viola and Narasimhand (2007) use a discriminative context-free (phrase structure) grammar for extracting information from semistructured data and report higher performances over CRFs .
Introduction
Contextual information often plays a big role in resolving tagging ambiguities and is one of the key benefits of discriminative models such as CRFs .
CRFs is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: