Abstract | In order to automatically generate a novel and non-redundant community answer summary, we segment the complex original multi-sentence question into several sub questions and then propose a general Conditional Random Field ( CRF ) based answer summary method with group L1 regularization. |
Introduction | We tackle the answer summary task as a sequential labeling process under the general Conditional Random Fields ( CRF ) framework: every answer sentence in the question thread is labeled as a summary sentence or non-summary sentence, and we concatenate the sentences with summary label to form the final summarized answer. |
Introduction | First, we present a general CRF based framework |
Introduction | Second, we propose a group L1-regularization approach in the CRF model for automatic optimal feature learning to unleash the potential of the features and enhance the performance of answer summarization. |
The Summarization Framework | Then under CRF (Lafferty et al., 2001), the conditional probability of y given X obeys the following distribution: p(ylx) = 22mm 2 Mgl<v,y|.,x> |
The Summarization Framework | Therefore, to explore the optimal combination of these features, we propose a group L1 regularization term in the general CRF model (Section 3.3) for feature learning. |
The Summarization Framework | These sentence-level features can be easily utilized in the CRF framework. |
Abstract | We use linear-chain conditional random fields (CRF) for sentence type tagging, and a 2D CRF to label the dependency relation between sentences. |
Introduction | We use linear-chain conditional random fields ( CRF ) to take advantage of many long-distance and nonlocal features. |
Introduction | First each sentence is considered as a source, and we run a linear-chain CRF to label whether each of the other sentences is its target. |
Introduction | Because multiple runs of separate linear-chain CRFs ignore the dependency between source sentences, the second approach we propose is to use a 2D CRF that models all pair relationships jointly. |
Related Work | In (Ding et al., 2008), a two-pass approach was used to find relevant solutions for a given question, and a skip-chain CRF was adopted to model long range de- |
Thread Structure Tagging | Linear-chain CRF is a special case of the general CRFs. |
Thread Structure Tagging | In linear-chain CRF , cliques only involve two adjacent variables in the sequence. |
Thread Structure Tagging | Figure 3 shows the graphical structure of a linear-chain CRF . |
A Class-based Model of Agreement | We treat segmentation as a character-level sequence modeling problem and train a linear-chain conditional random field ( CRF ) model (Lafferty et al., 2001). |
A Class-based Model of Agreement | Class-based Agreement Model t E T Set of morpho-syntactic classes 3 E S Set of all word segments 6569 Learned weights for the CRF-based segmenter 0mg Learned weights for the CRF-based tagger gbo, gbt CRF potential functions (emission and transition) |
A Class-based Model of Agreement | For this task we also train a standard CRF model on full sentences with gold classes and segmentation. |
Conclusion and Outlook | The model can be implemented with a standard CRF package, trained on existing treebanks for many languages, and integrated easily with many MT feature APIs. |
Algorithm | (2) under the log-loss results in a probabilistic model commonly known as a conditional random field ( CRF ) (Lafferty et al., 2001). |
Discussion | Large-margin learning, using the Passive-Aggressive and Pegasos algorithms, has benefits over CRF learning for our task: It produces sparser models, is faster, and produces better lexical access results. |
Experiments | 4We use the term “CRF” since the learning algorithm corresponds to CRF learning, although the task is multiclass classification rather than a sequence or structure prediction task. |
Experiments | CRF learning with the same features performs about 6% worse than the corresponding PA and Pegasos models. |
Experiments | The single-threaded running time for PNDP+ and Pegasos/DP+ is about 40 minutes per epoch, measured on a dual-core AMD 2.4GHz CPU with 8GB of memory; for CRF, it takes about 100 minutes for each epoch, which is almost entirely because the weight vector 0 is less sparse with CRF learning. |
System Architecture | Based on our CRF word segmentation system, we can compute a probability for each segment. |
System Architecture | Note that, although our model is a Markov CRF model, we can still use word features to learn word information in the training data. |
System Architecture | For traditional implementation of CRF systems (e.g., the HCRF package), usually the edges features contain only the information of yi_1 and y, and without the information of |
Evaluation | We first tested a standalone MWE recognizer based on CRF . |
Evaluation | The CRF recognizer relies on the software Wapiti6 (Lavergne et al., 2010) to train and apply the model, and on the software Unitex (Paumier, 2011) to apply lexical resources. |
Evaluation | Table 3: MWE identification with CRF : base are the features corresponding to token properties and word n-grams. |
MWE-dedicated Features | In order to deal with unknown words and special tokens, we incorporate standard tagging features in the CRF : lowercase forms of the words, word prefixes of length l to 4, word suffice of length l to 4, whether the word is capitalized, whether the token has a digit, whether it is an hyphen. |
Two strategies, two discriminative models | For such a task, we used Linear chain Conditional Ramdom Fields ( CRF ) that are discriminative prob- |
Introduction | Our work is similar to Jakob and Gurevych (2010) which proposed a Conditional Random Field ( CRF ) for cross-domain topic word extraction. |
Introduction | We denote iSVM and iCRF the in-domain SVM and CRF classifiers in experiments, and compare our proposed methods, |
Introduction | Cross-Domain CRF (Cross-CRF) we implement a cross-domain CRF algorithm proposed by (Jakob and Gurevych, 2010). |
Introduction | (2011) develop a system that exploits a CRF model to segment named |
Related Work | A linear CRF model |
Related Work | (2010) use Amazons Mechanical Turk service 3 and CrowdFlower 4 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling. |
Related Work | (2011) rebuild the NLP pipeline for tweets beginning with POS tagging, through chunking, to NER, which first exploits a CRF model to segment named entities and then uses a distantly supervised approach based on LabeledLDA to classify named entities. |
Background: Pairwise Coreference | For higher accuracy, a graphical model such as a conditional random field ( CRF ) is constructed from the compatibility functions to jointly reason about the pairwise decisions (McCallum and Wellner, 2004). |
Background: Pairwise Coreference | We now describe the pairwise CRF for coreference as a factor graph. |
Background: Pairwise Coreference | Given the pairwise CRF , the problem of coreference is then solved by searching for the setting of the coreference decision variables that has the highest probability according to Equation 1 subject to the |
Conclusion | Indeed, inference in the hierarchy is orders of magnitude faster than a pairwise CRF , allowing us to infer accurate coreference on |
Existing algorithms 3.1 Yarowsky | Note that this is a linear model similar to a conditional random field ( CRF ) (Lafferty et al., 2001) for unstructured multiclass problems. |
Existing algorithms 3.1 Yarowsky | It uses a CRF (Lafferty et al., 2001) as the underlying supervised learner. |
Existing algorithms 3.1 Yarowsky | It differs significantly from Yarowsky in two other ways: First, instead of only training a CRF it also uses a step of graph propagation between distributions over the n-grams in the data. |