Index of papers in Proc. ACL that mention
  • feature set
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Abstract
We evaluate our optimizer on Chinese-English and Arabic-English translation tasks, each with small and large feature sets, and show that our learner is able to achieve significant improvements of 1.2-2 BLEU and 1.7-4.3 TER on average over state-of-the-art optimizers with the large feature set .
Experiments
To evaluate the advantage of explicitly accounting for the spread of the data, we conducted several experiments on two Chinese-English translation test sets, using two different feature sets in each.
Experiments
We selected the bound step size D, based on performance on a held-out dev set, to be 0.01 for the basic feature set and 0.1 for the sparse feature set .
Experiments
4.2 Feature Sets
Introduction
Chinese-English translation experiments show that our algorithm, RM, significantly outperforms strong state-of-the-art optimizers, in both a basic feature setting and high-dimensional (sparse) feature space (§4).
Learning in SMT
The instability of MERT in larger feature sets (Foster and Kuhn, 2009; Hopkins and May, 2011), has motivated many alternative tuning methods for SMT.
feature set is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Lavergne, Thomas and Cappé, Olivier and Yvon, François
Conditional Random Fields
Based on a study of three NLP benchmarks, the authors of (Tsuruoka et al., 2009) claim this approach to be much faster than the orthant-wise approach and yet to yield very comparable performance, while selecting slightly larger feature sets .
Conditional Random Fields
The n-grm feature sets (n = {1, 3, 5, 7}) includes all features testing embedded windows of k letters, for all 0 g k g n; the n-grm- setting is similar, but only includes the window of length n; in the n-grm+ setting, we add features for odd-size windows; in the n-grm++ setting, we add all sequences of letters up to size n occurring in current window.
Conditional Random Fields
For instance, the active bigram features at position 75 = 2 in the sequence x=’lemma’ are as follows: the 3-grm feature set contains fywx, fyw/fl and fy/Mem; only the latter appears in the 3-grm- setting.
Introduction
An important property of CRFs is their ability to handle large and redundant feature sets and to integrate structural dependency between output labels.
Introduction
Limitating the feature set or the number of output labels is however frustrating for many NLP tasks, where the type and number of potentially relevant features are very large.
Introduction
Second, the experimental demonstration that using large output label sets is doable and that very large feature sets actually help improve prediction accuracy.
feature set is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Reiter, Nils and Frank, Anette
Introduction
In section 4 we motivate the choice of feature sets for the automatic identification of generic NPs in context.
Introduction
4.2 Feature set and feature classes
Introduction
The feature set includes NP-local and global features.
feature set is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
By adopting this ILP formulation, segmentation F—measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set .
Abstract
We adopt the basic feature set used in (Ratnaparkhi, 1996) and (Collins, 2002).
Abstract
As introduced in Section 2.2, we adopt a very compact feature set used in (Ratnaparkhi, l996)1.
feature set is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Experimental Setup
We experiment with two feature sets for each language: the optimized local feature sets (denoted local), and the optimized local feature sets extended with nonlocal features (denoted nonlocal).
Features
The feature sets are customized for each language.
Features
The exact definitions and feature sets that we use are available as part of the download package of our system.
Features
nonlocal features were selected with the same greedy forward strategy as the local features, starting from the optimized local feature sets .
Introducing Nonlocal Features
In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Introduction
We show that for the task of coreference resolution the straightforward combination of beam search and early update (Collins and Roark, 2004) falls short of more limited feature sets that allow for exact search.
Results
the English development set as a function of number of training iterations with two different beam sizes, 20 and 100, over the local and nonlocal feature sets .
Results
The left half uses the local feature set, and the right the extended nonlocal feature set .
Results
Local vs. Nonlocal feature sets .
feature set is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Li, Jianguo and Brew, Chris
Experiment Setup 4.1 Corpus
We evaluate six different feature sets for their effectiveness in AVC: SCF, DR, CO, ACO, SCF+CO, and J OANISO7 .
Experiment Setup 4.1 Corpus
The other four feature sets include both syntactic and lexical information.
Experiment Setup 4.1 Corpus
JOANISO7: We use the feature set proposed in J oanis et al.
Introduction
We develop feature sets that combine syntactic and lexical information, which are in principle useful for any Levin-style verb classification.
Introduction
We test the general applicability and scalability of each feature set to the distinctions among 48 verb classes involving 1,300 verbs, which is, to our knowledge, the largest investigation on English verb classification by far.
Introduction
To preview our results, a feature set that combines both syntactic information and lexical information works much better than either of them used alone.
Machine Learning Method
We construct a semantic space with each feature set .
Machine Learning Method
Except for J ONAISO7 which only contains 224 features, all the other feature sets lead to a very high-dimensional space.
Related Work
The deeper linguistic analysis allows their feature set to cover a variety of indicators of verb semantics, beyond that of frame information.
feature set is mentioned in 37 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Experiments
In order to evaluate the effectiveness of the cluster-based feature sets , we conducted dependency parsing experiments in English and Czech.
Experiments
In our English experiments, we tested eight different parsing configurations, representing all possible choices between baseline or cluster-based feature sets , first-order (Eisner, 2000) or second-order (Carreras, 2007) factorizations, and labeled or unlabeled parsing.
Experiments
Second, note that the parsers using cluster-based feature sets consistently outperform the models using the baseline features, regardless of model order or label usage.
Feature design
The feature sets we used are similar to other feature sets in the literature (McDonald et al., 2005a; Carreras, 2007), so we will not attempt to give a exhaustive description of the features in this section.
Feature design
In our experiments, we employed two different feature sets: a baseline feature set which draws upon “normal” information sources such as word forms and parts of speech, and a cluster-based feature set that also uses information derived from the Brown cluster hierarchy.
Feature design
Our first-order baseline feature set is similar to the feature set of McDonald et al.
feature set is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.
Abstract
We present a fast and scalable online method for tuning statistical machine translation models with large feature sets .
Adaptive Online Algorithms
When we have a large feature set and therefore want to tune on a large data set, batch methods are infeasible.
Adaptive Online MT
For example, simple indicator features like lexicalized reordering classes are potentially useful yet bloat the the feature set and, in the worst case, can negatively impact
Experiments
To the dense features we add three high dimensional “sparse” feature sets .
Experiments
The primary baseline is the dense feature set tuned with MERT (Och, 2003).
Experiments
with the PT feature set .
feature set is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Experimental Setup
Instead, we use several baselines to demonstrate the usefulness of integrating multiple LCs, as well as the relative usefulness of our feature sets .
Experimental Setup
The other evaluated systems are formed by taking various subsets of our feature set .
Experimental Setup
We experiment with 4 feature sets .
Introduction
(2012) and compare our methods with analogous ones that select a fixed LC, using state-of-the-art feature sets .
Our Proposal: A Latent LC Approach
Section 3.1 describes our general approach, Section 3.2 presents our model and Section 3.3 details the feature set .
Our Proposal: A Latent LC Approach
We choose this model for its generality, conceptual simplicity, and because it allows to easily incorporate various feature sets and sets of latent variables.
Our Proposal: A Latent LC Approach
3.3 Feature Set
feature set is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Chen, Yanping and Zheng, Qinghua and Zhang, Wei
Discussion
In this section, we analyze the influences of employed feature sets and constraint conditions on the performances.
Discussion
Because features may interact mutually in an indirect way, even with the same feature set , different constraint conditions can have significant influences on the final performance.
Discussion
In Section 3, we introduced five candidate feature sets .
Feature Construction
3.1 Candidate Feature Set
Feature Construction
To sum up, among the five candidate feature sets , the position feature is used as a singleton feature.
Feature Construction
In the following experiments, focusing on Chinese relation extraction, we will analyze the performance of candidate feature sets and study the influence of the constraint conditions.
feature set is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Lippincott, Thomas and Korhonen, Anna and Ó Séaghdha, Diarmuid
Methodology
In this section we describe the basic components of our study: feature sets , graphical model, inference, and evaluation.
Methodology
3.1 Input and feature sets
Methodology
We tested several feature sets either based on, or approximating, the concept of grammatical relation described in section 2.
Results
We evaluated SCF leXicons based on the eight feature sets described in section 3.1, as well as the VALEX SCF leXicon described in section 2.
feature set is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Habash, Nizar and Roth, Ryan
Results
5.1 Effect of Feature Set Choice
Results
Table 3 illustrates the result of taking a baseline feature set (containing word as the only feature) and adding a single feature from the Simple set to it.
Results
Feature Set F-score %Imp word 43.85 —word+nw 43.86 N0 word+na 44.78 2.1 word+lem 45.85 4.6 word+pos 45.91 4.7 word+nw+pos+lem+na 46.34 5.7
feature set is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn
Experiments
In both experiments we observed that the performance drops when excitation polarities and trouble expressions are removed from the feature set .
Experiments
PROPOSED-*: The proposed method without the feature set denoted by “*”.
Experiments
PROPOSED-*z The proposed method without the feature set denoted by “*”.
Problem Report and Aid Message Recognizers
The feature set given to the SVMs are summarized in the top part of Table 2.
Problem Report and Aid Message Recognizers
Note that we used a common feature set for both the problem report recognizer and aid message recognizer and that it is categorized into several types: features concerning trouble expressions (TR), excitation polarity (EX), their combination (TREXl) and word sentiment polarity (WSP), features expressing morphological and syntactic structures of nuclei and their context surrounding problem/aid nuclei (MSA), features concerning semantic word classes (SWC) appearing in nuclei and their context, request phrases, such as “Please help us”, appearing in tweets (REQ), and geographical locations in tweets recognized by our location recognizer (GL).
Problem Report and Aid Message Recognizers
We also attempted to represent nucleus template IDs, noun IDs and their combinations directly in our feature set to capture typical templates fre-
Problem-Aid Match Recognizer
Here also we attempted to capture typical or frequent matches of nuclei using template and noun IDs and their combinations, but we did not observe any improvement so we omit them from the feature set .
Problem-Aid Match Recognizer
The bottom part of Table 2 summarizes the additional feature set , some of which are described below in more detail.
feature set is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc
Evaluation and Discussion
The SVMs achieve a similar cross-validated performance on all feature sets containing ngrams, showing only minor improvements for individual flaws when adding non-lexical features.
Evaluation and Discussion
Table 6 shows the performance of the SVMs with RBF kernel12 on each dataset using the NGRAM feature set .
Evaluation and Discussion
Classifiers using the N ONGRAM feature set achieved average F 1-scores below 0.50 on all datasets.
Experiments
We selected a subset of these features for our experiments and grouped them into four feature sets in order to determine how well different combinations of features perform in the task.
Experiments
Table 4: Feature sets used in the experiments
feature set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thien Huu and Grishman, Ralph
Experiments
We evaluate word cluster and embedding (denoted by ED) features by adding them individually as well as simultaneously into the baseline feature set .
Experiments
This might be explained by the difference between our baseline feature set and the feature set underlying their kernel-based system.
Feature Set
5.1 Baseline Feature Set
Feature Set
(2011) utilize the full feature set from (Zhou et al., 2005) plus some additional features and achieve the state-of-the-art feature-based RE system.
Feature Set
Unfortunately, this feature set includes the human-annotated (gold-standard) information on entity and mention types which is often missing or noisy in reality (Plank and Moschitti, 2013).
Introduction
Recent research in this area, whether feature-based (Kambhatla, 2004; Boschee et al., 2005; Zhou et al., 2005; Grishman et al., 2005; Jiang and Zhai, 2007a; Chan and Roth, 2010; Sun et al., 2011) or kernel-based (Zelenko et al., 2003; Bunescu and Mooney, 2005a; Bunescu and Mooney, 2005b; Zhang et al., 2006; Qian et al., 2008; Nguyen et al., 2009), attempts to improve the RE performance by enriching the feature sets from multiple sentence analyses and knowledge resources.
feature set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Metallinou, Angeliki and Bohus, Dan and Williams, Jason
Generative state tracking
In DIS-CDYNl, we use the original feature set , ignoring the problem described above (so that the general features contribute no information), resulting in M + K weights.
Generative state tracking
The analysis of various feature sets indicates that the ASR/SLU error correlation (confusion) features yield the largest improvement — c.f.
Generative state tracking
feature set be compared to b in Table 3.
Introduction
portance of different feature sets for this task, and measure the amount of data required to reliably train our model.
feature set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Cao, Yuan and Khudanpur, Sanjeev
Abstract
This regularizes the model complexity and makes the tensor model highly effective in situations where a large feature set is defined but very limited resources are available for training.
Conclusion and Future Work
This can be regarded as a form of model regular-ization.Therefore, compared with the traditional vector-space models, learning in the tensor space is very effective when a large feature set is defined, but only small amount of training data is available.
Introduction
This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets : the feature weights cannot be estimated reliably.
Introduction
Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set .
Tensor Model Construction
ways of mapping, which is an intractable number of possibilities even for modest sized feature sets , making it impractical to carry out a brute force search.
Tensor Space Representation
In general, if V features are defined for a learning problem, and we (i) organize the feature set as a tensor (I) E Rnlxn2x"'nD and (ii) use H component rank-l tensors to approximate the corresponding target weight tensor.
Tensor Space Representation
Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge.
feature set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Experiments
Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1).
Experiments
We compare against the language-specific feature sets detailed in the literature on high-resource top-performing SRL systems: From Bj o'rkelund et al.
Experiments
(2009), these are feature sets for German, English, Spanish and Chinese, obtained by weeks of forward selection (Bdeflmemh); and from Zhao et al.
feature set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yancheva, Maria and Rudzicz, Frank
Discussion and future work
l.00‘_ Feature Set ; —LIWC 3‘ ‘ —S ntactic «3 .90“.
Discussion and future work
Figure 3: Effect of feature set choice on cross-validation accuracy.
Discussion and future work
2012; Almela et al., 2012; Fomaciari and Poesio, 2012), our results suggest that the set of syntactic features presented here perform significantly better than the LIWC feature set on our data, and across seven out of the eight experiments based on age groups and verbosity of transcriptions.
Related Work
Descriptions of the data (section 3) and feature sets (section 4) precede experimental results (section 5) and the concluding discussion (section 6).
feature set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Abstract
With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data.
Discussion
In future work, we would like to investigate more sophisticated features, better learners, and in general improve the components of our system that have been neglected in the current investigation of relative improvements by scaling the size of data and feature sets .
Experiments
The results on the news-commentary (nc) data show that training on the development set does not benefit from adding large feature sets — BLEU result differences between tuning 12 default features
Experiments
Here tuning large feature sets on the respective dev sets yields significant improvements of around 2 BLEU points over tuning the 12 default features on the dev sets.
Introduction
Our resulting models are learned on large data sets, but they are small and outperform models that tune feature sets of various sizes on small development sets.
Related Work
All approaches have been shown to scale to large feature sets and all include some kind of regularization method.
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo
Approach
To answer the second research objective we will analyze the contribution of the proposed feature set to this function.
Approach
For completeness we also include in the feature set the value of the t f - idf similarity measure.
Experiments
Feature Set MRR P@1
Experiments
The algorithm incrementally adds to the feature set the feature that provides the highest MRR improvement in the development partition.
Related Work
This approach allowed us to perform a systematic feature analysis on a large-scale real-world corpus and a comprehensive feature set .
Related Work
Our model uses a larger feature set that includes correlation and transformation-based features and five different content representations.
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Abstract
Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with handcrafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set .
Introduction
After concatenating the SSWE feature with existing feature set , we push the state-of-the-art to 86.58% in macro-Fl.
Related Work
NRC-ngram refers to the feature set of NRC leaving out ngram features.
Related Work
After concatenating SSWEU with the feature set of NRC, the performance is further improved to 86.58%.
Related Work
The concatenated features SSWEu +NRC-ngram (86.48%) outperform the original feature set of NRC (84.73%).
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Uchimoto, Kiyotaka and Torisawa, Kentaro
Acquisition of Hyponymy Relations from Wikipedia
(2008) but LFl—LF5 and SFl-SFQ are the same as their feature set .
Acquisition of Hyponymy Relations from Wikipedia
Let us provide an overview of the feature sets used in Sumida et al.
Acquisition of Hyponymy Relations from Wikipedia
These are the feature sets used in Sumida et al.
Motivation
Since the learning settings ( feature sets , feature values, training data, corpora, and so on) are usually different in two languages, the reliable part in one language may be overlapped by an unreliable part in another language.
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Polifroni, Joseph and Walker, Marilyn
Experiment Two
We derive two types of feature sets from the responses: features derived from each user model and features derived from attributes of the query/ response pair itself.
Experiment Two
The five feature sets for the user model are:
Experiment Two
0 allUtz’lz’ty: 12 features consisting of the high, low, and average utility scores from the previous three feature sets .
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Prabhakaran, Vinodkumar and Rambow, Owen
Conclusion
Using this feature set , we obtain an accuracy of 73.0% on a blind test.
Introduction
This best-performing system uses our new feature set .
Predicting Direction of Power
We use another feature set LEX to capture word ngrams, POS (part of speech) ngrams and mixed ngrams.
Predicting Direction of Power
We also performed an ablation study to understand the importance of different slices of our feature sets .
Structural Analysis
THRPR: This feature set includes two meta-
Structural Analysis
data based feature sets — positional and verbosity.
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Experiments
Second, note that the parsers incorporating the N-gram feature sets consistently outperform the models using the baseline features in all test data sets, regardless of model order or label usage.
Web-Derived Selectional Preference Features
In this paper, we employ two different feature sets: a baseline feature set3 which draw upon “normal” information source, such as word forms and part-of-speech (POS) without including the web-derived selectional preference4 features, a feature set conjoins the baseline features and the web-derived selectional preference features.
Web-Derived Selectional Preference Features
3 This kind of feature sets are similar to other feature sets in the literature (McDonald et al., 2005; Carreras, 2007), so we will not attempt to give a exhaustive description.
Web-Derived Selectional Preference Features
For example, the baseline feature set includes indicators for word-to-word and tag-to-tag interactions between the head and modifier of a dependency.
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
Abstract
To illustrate, consider the following feature set , a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract
While the feature set was too small to produce notable results, we identified which features actually were indicative of lect.
feature set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kozareva, Zornitsa
Conclusion
We have conducted exhaustive evaluation with multiple machine learning classifiers and different features sets spanning from lexical information to psychological categories developed by (Tausczik and Pennebaker, 2010).
Task A: Polarity Classification
We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams.
Task A: Polarity Classification
For each information source (metaphor, context, source, target and their combinations), we built a separate n-gram feature set and model, which was evaluated on 10-fold cross validation.
Task A: Polarity Classification
We have used different feature sets and information sources to solve the task.
Task B: Valence Prediction
We have studied different feature sets and information sources to solve the task.
feature set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Xiaohua and Zhou, Ming and Zhou, Xiangyang and Fu, Zhongyang and Wei, Furu
Experiments
Table 4: Overall F1 (%) of NER and Accuracy (%) of N EN with different feature sets .
Experiments
Table 4 shows the overall performance of our method with various feature set combinations, where F0, F; and F9 denote the orthographic features, the lexical features, and the gazetteer-related features, respectively.
Our Method
(2) {$21521 and {$225521 are two feature sets .
Our Method
4.3.1 Feature Set One: {$21) 5:11
Our Method
4.3.2 Feature Set Two: {$22) 5:21
feature set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Pitler, Emily and Louis, Annie and Nenkova, Ani
Results and discussion
For all four other questions, the best feature set is Continuity, which is a combination of summarization specific features, coreference features and cosine similarity of adjacent sentences.
Results and discussion
Feature set Gram.
Results and discussion
Feature set Gram.
feature set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Challenges for Discriminative SMT
This problem of over-fitting is exacerbated in discriminative models with large, expressive, feature sets .
Challenges for Discriminative SMT
Learning with a large feature set requires many training examples and typically many iterations of a solver during training.
Evaluation
To do this we use our own implementation of Hiero (Chiang, 2007), with the same grammar but with the traditional generative feature set trained in a linear model with minimum BLEU training.
Evaluation
The feature set includes: a trigram language model (lm) trained
Evaluation
The relative scores confirm that our model, with its minimalist feature set, achieves comparable performance to the standard feature set without the language model.
feature set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Espinosa, Dominic and White, Michael and Mehay, Dennis
Conclusion
Finally, further efforts to engineer a grammar suitable for realization from the CCGbank should provide richer feature sets , which, as our feature ablation study suggests, are useful for boosting hypertagging performance, hence for finding better and more complete realizations.
Results and Discussion
The the whole feature set was found in feature ablation testing on the development set to outperform all other feature subsets significantly (p < 2.2 - 10—16).
Results and Discussion
The full feature set outperforms all others significantly (p < 2.2 - 10—16).
Results and Discussion
The results for the full feature set on Sections ()0 and 23 are outlined in Table 2.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris
Experiments
Experimental results are given in Table 2, where we also provide the number of features in each feature set .
Experiments
Figure 1: ROC curves for classifiers trained using different feature sets (English SVO and AN test sets).
Experiments
According to ROC plots in Figure 1, all three feature sets are effective, both for SVO and for AN tasks.
Related Work
Current work builds on this study, and incorporates new syntactic relations as metaphor candidates, adds several new feature sets and different, more reliable datasets for evaluating results.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin
Syllabification Experiments
In this section, we will discuss the results of our best emission feature set (five-gram features with a context window of eleven letters) on held-out unseen test sets.
Syllabification with Structured SVMs
With SVM-HMM, the crux of the task is to create a tag scheme and feature set that produce good results.
Syllabification with Structured SVMs
After experimenting with the development set, we decided to include in our feature set a window of eleven characters around the focus character, five on either side.
Syllabification with Structured SVMs
As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set .
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Elsner, Micha and Charniak, Eugene
Future Work
We are also interested to see how well this feature set performs on speech data, as in (Aoki et al., 2003).
Related Work
They motivate a richer feature set , which, however, does not yet appear to be implemented.
Related Work
(2005) adds word repetition to their feature set .
Related Work
Our feature set incorporates information which has proven useful in meeting segmentation (Galley et al., 2003) and the task of detecting addressees of a specific utterance in a meeting (J ovanovic et al., 2006).
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna
Maximum Entropy Based Model for Hindi NER
In Table 2 we have shown the accuracy values for few feature sets .
Maximum Entropy Based Model for Hindi NER
Again when wi_2 and rut-+2 are deducted from the feature set (i.e.
Maximum Entropy Based Model for Hindi NER
When suffix, prefix and digit information are added to the feature set , the f-value is increased upto 74.26.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori
Causal Relations for Why-QA
We used the three types of feature sets in Table 3 for training the CRFs, where j is in the range of z' — 4 g j g i + 4 for current position i in a causal relation candidate.
Causal Relations for Why-QA
More detailed information concerning the configurations of all the nouns in all the candidates of an appropriate causal relation (including their cause parts) and the question are encoded into our feature set 6 f1—e f4 in Table 4 and the final judgment is done by our re-ranker.
Experiments
We evaluated the performance when we removed one of the three types of features (ALL-“MORPH”, ALL-“SYNTACTIC” and ALL-“C-MARKER”) and compared the results in these settings with the one when all the feature sets were used (ALL).
Experiments
We confirmed that all the feature sets improved the performance, and we got the best performance when using all of them.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Experiment
To compare our joint inference versus other learning models, we also employed a decision tree (DT) learner, equipped with the same feature set as our FCRF.
Experiment
Both models take the whole feature set described in Section 2.3.
Experiment
3.4.3 Feature set evaluation
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Veale, Tony and Hao, Yanfen and Li, Guofu
Empirical Evaluation: Simile-derived Representations
Suspecting that a noisy feature set had contributed to the apparent drop in performance, these authors then proceed to apply a variety of noise filters to reduce the set of feature values to 51,345, which in turn leads to an improved cluster purity measure of 62.7%.
Empirical Evaluation: Simile-derived Representations
In experiment 2, we see a similar ratio of feature quantities before filtering; after some initial filtering, Almuhareb and Poesio reduce their feature set to just under 10 times the size of the simile-derived feature set .
Empirical Evaluation: Simile-derived Representations
First, the feature representations do not need to be hand-filtered and noise-free to be effective; we see from the above results that the raw values extracted from the simile pattern prove slightly more effective than filtered feature sets used by Almuhareb and Poesio.
Related Work
As noted by the latter authors, this results in a much smaller yet more diagnostic feature set for each concept.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Scheible, Christian and Schütze, Hinrich
Distant Supervision
However, we did not find a cumulative effect (line 8) of the two feature sets .
Features
We refer to these feature sets as CoreLex (CX) and VerbNet (VN) features and to their combination as semantic features (SEM).
Features
feature set is referred to as named entities (NE).
Features
We refer to this feature set as sequential features (SQ).
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Mann, Gideon and McCallum, Andrew
Experimental Comparison with Unsupervised Learning
With this feature set , the CRF model is less expressive than DMV.
Experimental Comparison with Unsupervised Learning
The CRF cannot consider valency even with the full feature set , but this is balanced by the ability to use distance.
Experimental Comparison with Unsupervised Learning
First we note that GE training using the full feature set substantially outperforms the restricted feature set , despite the fact that the same set of constraints is used for both experiments.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.
Conclusion and Outlook
Conceputalizing MT evaluation as an entailment problem motivates the use of a rich feature set that covers, unlike almost all earlier metrics, a wide range of linguistic levels, including lexical, syntactic, and compositional phenomena.
Expt. 2: Predicting Pairwise Preferences
Feature set Consis- System-level tency (%) correlation (p)
Introduction
(2005)), and thus predict the quality of MT hypotheses with a rich RTE feature set .
Regression-based MT Quality Prediction
(2007) train binary classifiers on a feature set formed by a number of MT metrics.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Introduction
Second, although the feature set is fundamentally a combination of those used in previous works (Zhang and Clark, 2010; Huang and Sagae, 2010), to integrate them in a single incremental framework is not straightforward.
Model
The feature set of our model is fundamentally a combination of the features used in the state-of-the-art joint segmentation and POS tagging model (Zhang and Clark, 2010) and dependency parser (Huang and Sagae, 2010), both of which are used as baseline models in our experiment.
Model
All of the models described above except Dep’ are based on the same feature sets for segmentation and
Related Works
Zhang and Clark (2008) proposed an incremental joint segmentation and POS tagging model, with an effective feature set for Chinese.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Falk, Ingrid and Gardent, Claire and Lamirel, Jean-Charles
Clustering Methods, Evaluation Metrics and Experimental Setup
Table 1: Sample output for a cluster produced with the grid-scf-sem feature set and the IGNGF clustering method.
Features and Data
Table 4(a) includes the evaluation results for all the feature sets when using IGNGF clustering.
Features and Data
In terms of features, the best results are obtained using the grid-scf-sem feature set with an F-measure of 0.70.
Features and Data
In contrast, the classification obtained using the scf-synt-sem feature set has a higher CMP for the clustering with optimal mPUR (0.57); but a lower F—measure (0.61), a larger number of classes (16)
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben
Approach
3.2 Feature set
Approach
Our full feature set is as follows:
Conclusions and future work
The addition of an incoherence metric to the feature set of an AA system has been shown to improve performance significantly (Miltsakaki and Kukich, 2000; Miltsakaki and Kukich, 2004).
Validity tests
Although the above modifications do not exhaust the potential challenges a deployed AA system might face, they represent a threat to the validity of our system since we are using a highly related feature set .
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Monroe, Will and Green, Spence and Manning, Christopher D.
Arabic Word Segmentation Model
This feature set also allows the model to take into account other interactions between the beginning and end of a word, particularly those involving the definite article Jl al-.
Arabic Word Segmentation Model
A notable property of this feature set is that it remains highly dialect-agnostic, even though our additional features were chosen in response to errors made on text in Egyptian dialect.
Error Analysis
0 errors that can be fixed with a fuller analysis of just the problematic token, and therefore represent a deficiency in the feature set ; and
Error Analysis
In 36 of the 100 sampled errors, we conjecture that the presence of the error indicates a shortcoming of the feature set , resulting in segmentations that make sense locally but are not plausible given the full token.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Iida, Ryu and Inui, Kentaro and Matsumoto, Yuji
Machine learning-based cache model
Therefore, the intra-sentential and inter-sentential zero-anaphora resolution models are separately trained by exploiting different feature sets as shown in Table 2.
Machine learning-based cache model
Table 1: Feature set used in the cache models
Machine learning-based cache model
The feature set used in the cache model is shown in Table l. The ‘CASE_MARKER’ feature roughly captures the salience of the local transition dealt with in Centering Theory, and is also intended to capture the global foci of a text coupled with the BEGINNING feature.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Ruihong and Riloff, Ellen
Related Work
But, importantly, our classifiers all use the same feature set so they do not represent independent views of the data.
Related Work
The feature set for these classifiers is exactly the same as described in Section 3.2, except that we add a new lexical feature that represents the head noun of the target NP (i.e., the NP that needs to be tagged).
Related Work
3But technically this is not co-training because our feature sets are all the same.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Croce, Danilo and Giannone, Cristina and Annesi, Paolo and Basili, Roberto
Empirical Analysis
Results on the Boundary Detection BD task are obtained by training an SVM model on the same feature set presented in (J ohansson and Nugues, 2008b) and are slightly below the state-of-the art BD accuracy reported in (Coppola et al., 2009).
Empirical Analysis
Given the relatively simple feature set adopted here, this result is very significant as for its resulting efficiency.
Introduction
Notice how this is also a general problem of statistical learning processes, as large fine grain feature sets are more exposed to the risks of overfitting.
Related Work
While these approaches increase the expressive power of the models to capture more general linguistic properties, they rely on complex feature sets , are more demanding about the amount of training information and increase the overall exposure to overfitting effects.
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Dependency Parsing: Baseline
With notations defined in Table l, a feature set as shown in Table 2 is adopted.
Dependency Parsing: Baseline
We used a large scale feature selection approach as in (Zhao et al., 2009) to obtain the feature set in Table 2.
Evaluation Results
The results with different feature sets are in Table 4.
Evaluation Results
Table 4: The results with different feature sets features with p without p
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yi and Wang, Rui
Dependency Parsing with HPSG
Therefore, we extend this feature set by adding four more feature categories, which are similar to the original ones, but the dependency relation was replaced by the dependency backbone of the HP S G outputs.
Dependency Parsing with HPSG
The extended feature set is shown in Table 1.
Dependency Parsing with HPSG
The extended feature set is shown in Table 2 (the new features are listed separately).
feature set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Turchi, Marco and Anastasopoulos, Antonios and C. de Souza, José G. and Negri, Matteo
Evaluation framework
4.2 Performance indicator and feature set
Evaluation framework
As our focus is on the algorithmic aspect, in all experiments we use the same feature set , which consists of the seventeen features proposed in (Specia et al., 2009).
Evaluation framework
This feature set , fully described in (Callison-Burch et al., 2012), takes into account the complexity of the source sentence (e. g. number of tokens, number of translations per source word) and the fluency of the target translation (e. g. language model probabilities).
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Durrett, Greg and Klein, Dan
Annotations
Table 2 shows the performance of our feature set in grammars with several different levels of structural annotation.3 Klein and Manning (2003) find large gains (6% absolute improvement, 20% relative improvement) going from v = 0, h = 0 to v = l, h = 1; however, we do not find the same level of benefit.
Features
Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set.
Introduction
Our parser can be easily adapted to this task by replacing the X-bar grammar over treebank symbols with a grammar over the sentiment values to encode the output variables and then adding n-gram indicators to our feature set to capture the bulk of the lexical effects.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, William Yang and Hua, Zhenhao
Copula Models for Text Regression
This nice property essentially allows us to fuse distinctive lexical, syntactic, and semantic feature sets naturally into a single compact model.
Experiments
Feature sets:
Experiments
To do this, we sample equal amount of features from each feature set , and concatenate
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Abstract
Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features.
Experiment
We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features.
Related Work
Finally, we enhanced our parsing model by enlarging the feature set with nonlocal features and semi-supervised word cluster features.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Radziszewski, Adam
CRF and features
The work describes a feature set proposed for this task, which includes word forms in a local window, values of grammatical class, gender, number and case, tests for agreement on number, gender and case, as well as simple tests for letter case.
CRF and features
We took this feature set as a starting point.
CRF and features
The final feature set includes the following
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Empirical Evaluation
To compare classification performance, we use two feature sets : (i) standard word + POS 1-4 grams and (ii) AD-expressions from $5.
Empirical Evaluation
Predicting agreeing arguing nature is harder than that of disagreeing across all feature settings .
Empirical Evaluation
Using the discovered AD-eXpressions (Table 6, last low) as features renders a statistically significant (see Table 6 caption) improvement over other baseline feature settings .
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Hua and Barbosa, Denilson and Kondrak, Grzegorz
Features
The principal feature sets are listed in Table 2, together with an indication whether they are novel or have been used in previous work.
Speaker Identification
Table 2: Principal feature sets .
Speaker Identification
subsequently we add three more feature sets that represent the following neighboring utterances: n — 2, n — l and n + l. Informally, the features of the utterances n — l and n + l encode the first observation, while the features representing the utterance n — 2 encode the second observation.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Darwish, Kareem
Related Work
(2007) used a maximum entropy classifier trained on a feature set that includes the use of gazetteers and a stop-word list, appearance of a NE in the training set, leading and trailing word bigrams, and the tag of the previous word.
Related Work
(2008), they examined the same feature set on the Automatic Content Extraction (ACE) datasets using CRF
Related Work
Abdul-Hamid and Darwish (2010) used a simplified feature set that relied primarily on character level features, namely leading and trailing letters in a word.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tang, Hao and Keshet, Joseph and Livescu, Karen
Experiments
Models labeled X/Y use learning algorithm X and feature set Y.
Experiments
The feature set DP+ contains TF—IDF, DP alignment, dictionary, and length features.
Experiments
The results on the test fold are shown in Figure l, which compares the learning algorithms, and Figure 2, which compares feature sets .
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mehdad, Yashar and Negri, Matteo and Federico, Marcello
Beyond lexical CLTE
builds on two additional feature sets , derived from i) semantic phrase tables, and ii) dependency relations.
Experiments and results
(a) In both settings all the feature sets used outperform the approaches taken as terms of comparison.
Experiments and results
As shown in Table 1, the combined feature set (PT+SPT+DR) significantly5 outperforms the leXical model (64.5% vs 62.6%), while SPT and DR features separately added to PT (PT+SPT, and PT+DR) lead to marginal improvements over the results achieved by the PT model alone (about 1%).
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Vanessa Wei and Hirst, Graeme
Experiments
us to compare the model-fitting capacity of different feature sets from another perspective, especially when the training data is not sufficiently well fitted by the model.
Method
We refine Hernault et al.’s original feature set by incorporating our own features as well as some adapted from Lin et al.
Method
(2009) also incorporated contextual features in their feature set .
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Wenliang and Zhang, Min and Li, Haizhou
Decoding
With the rich feature set in Table l, the running time of Intersect is longer than the time of Rescoring.
Experiments
Table 2 shows the feature settings of the systems, where MSTl/2 refers to the basic first—/second-order parser and MSTB l/2 refers to the enhanced first-/second-order parser.
Experiments
MSTBl and MSTB2 used the same feature setting , but used different order models.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.
Automated Approaches to Deceptive Opinion Spam Detection
Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set .
Automated Approaches to Deceptive Opinion Spam Detection
We consider all three n-gram feature sets , namely UNIGRAMS, BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Automated Approaches to Deceptive Opinion Spam Detection
We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS, BIGRAMS+, and TRIGRAMS+.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Blanco, Eduardo and Moldovan, Dan
Learning Algorithm
The held-out portion is used to tune the feature set and results are reported for the test split only, i.e., using unseen instances.
Learning Algorithm
We improve BASIC with an extended feature set which targets especially A1 and the verb (Table 5).
Negation in Natural Language
The main contributions are: (l) interpretation of negation using focus detection; (2) focus of negation annotation over all PropBank negated sen-tencesl; (3) feature set to detect the focus of negation; and (4) model to semantically represent negation and reveal its underlying positive meaning.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel
Discussion
The held-out results in Figure 2 suggest that the combination of syntactic and lexical features provides better performance than either feature set on its own.
Evaluation
At most recall levels, the combination of syntactic and lexical features offers a substantial improvement in precision over either of these feature sets on its own.
Evaluation
No feature set strongly outperforms any of the others across all relations.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Jian and Taylor, Sarah M. and Smith, Jonathan L. and Fotiadis, Konstantinos A. and Giles, C. Lee
Conclusions
Future research directions include developing rich feature sets and using corpus level or external information.
Experiments
Since different feature sets , NLP tools, etc are used in different benchmarked systems, we are also interested in comparing the proposed algorithm with different soft relational clustering variants.
Experiments
With the same feature sets and distance function, KARC-S outperforms FRC in F score by about 5%.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Garera, Nikesh and Yarowsky, David
Corpus Details
However, stopwords were retained in the feature set as various sociolinguistic studies have shown that use of some of the stopwords, for instance, pronouns and determin-ers, are correlated with age and gender.
Corpus Details
Also, only the ngrams with frequency greater than 5 were retained in the feature set following Boulis and Ostendorf (2005).
Related Work
Another relevant line of work has been on the blog domain, using a bag of words feature set to discriminate age and gender (Schler et al., 2006; Burger and Henderson, 2006; Nowson and Oberlander, 2006).
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Galley, Michel and Manning, Christopher D.
Dependency parsing for machine translation
The three feature sets that were used in our experiments are shown in Table 2.
Dependency parsing for machine translation
It is quite similar to the McDonald (2005a) feature set , except that it does not include the set of all POS tags that appear between each candidate head-modifier pair (i , j).
Dependency parsing for machine translation
The primary difference between our feature sets and the ones of McDonald et a1.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Experiments
Our feature set is summarized in Table 2, which closely follows Chamiak and Johnson (2005), except that we excluded the nonlocal features Edges, NGram, and CoPar, and simplified Rule and NGramTree features, since they were too complicated to compute.4 We also added four unlexicalized local features from Collins (2000) to cope with data-sparsity.
Experiments
tures in the updated version.5 However, our initial experiments show that, even with this much simpler feature set , our 50-best reranker performed equally well as theirs (both with an F-score of 91.4, see Tables 3 and 4).
Experiments
This result confirms that our feature set design is appropriate, and the averaged perceptron learner is a reasonable candidate for reranking.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan
Experimental Setup
Table 1: Performance of EDITDIST and our model with various features sets on EN -ES-W. See section 5.
Experimental Setup
We will use MCCA (for matching CCA) to denote our model using the optimal feature set (see section 5.3).
Introduction
As an example of the performance of the system, in English-Spanish induction with our best feature set , using corpora derived from topically similar but nonparallel sources, the system obtains 89.0% precision at 33% recall.
feature set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: