Index of papers in Proc. ACL that mention

feature set

Seen in text as:

feature set (276)
feature sets (172)
Feature Set (13)
Feature set (8)
feature settings (6)
feature setting (5)
Feature Sets (4)

Seen in 463 sentences in 71 papers.

1. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We evaluate our optimizer on Chinese-English and Arabic-English translation tasks, each with small and large feature sets, and show that our learner is able to achieve significant improvements of 1.2-2 BLEU and 1.7-4.3 TER on average over state-of-the-art optimizers with the large feature set .
Experiments	To evaluate the advantage of explicitly accounting for the spread of the data, we conducted several experiments on two Chinese-English translation test sets, using two different feature sets in each.
Experiments	We selected the bound step size D, based on performance on a held-out dev set, to be 0.01 for the basic feature set and 0.1 for the sparse feature set .
Experiments	4.2 Feature Sets
Introduction	Chinese-English translation experiments show that our algorithm, RM, significantly outperforms strong state-of-the-art optimizers, in both a basic feature setting and high-dimensional (sparse) feature space (§4).
Learning in SMT	The instability of MERT in larger feature sets (Foster and Kuhn, 2009; Hopkins and May, 2011), has motivated many alternative tuning methods for SMT.

feature set is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

2. Practical Very Large Scale CRFs

Lavergne, Thomas and Cappé, Olivier and Yvon, François

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional Random Fields	Based on a study of three NLP benchmarks, the authors of (Tsuruoka et al., 2009) claim this approach to be much faster than the orthant-wise approach and yet to yield very comparable performance, while selecting slightly larger feature sets .
Conditional Random Fields	The n-grm feature sets (n = {1, 3, 5, 7}) includes all features testing embedded windows of k letters, for all 0 g k g n; the n-grm- setting is similar, but only includes the window of length n; in the n-grm+ setting, we add features for odd-size windows; in the n-grm++ setting, we add all sequences of letters up to size n occurring in current window.
Conditional Random Fields	For instance, the active bigram features at position 75 = 2 in the sequence x=’lemma’ are as follows: the 3-grm feature set contains fywx, fyw/fl and fy/Mem; only the latter appears in the 3-grm- setting.
Introduction	An important property of CRFs is their ability to handle large and redundant feature sets and to integrate structural dependency between output labels.
Introduction	Limitating the feature set or the number of output labels is however frustrating for many NLP tasks, where the type and number of potentially relevant features are very large.
Introduction	Second, the experimental demonstration that using large output label sets is doable and that very large feature sets actually help improve prediction accuracy.

feature set is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

3. Identifying Generic Noun Phrases

Reiter, Nils and Frank, Anette

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In section 4 we motivate the choice of feature sets for the automatic identification of generic NPs in context.
Introduction	4.2 Feature set and feature classes
Introduction	The feature set includes NP-local and global features.

feature set is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

4. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	By adopting this ILP formulation, segmentation F—measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set .
Abstract	We adopt the basic feature set used in (Ratnaparkhi, 1996) and (Collins, 2002).
Abstract	As introduced in Section 2.2, we adopt a very compact feature set used in (Ratnaparkhi, l996)1.

feature set is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

5. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Björkelund, Anders and Kuhn, Jonas

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	We experiment with two feature sets for each language: the optimized local feature sets (denoted local), and the optimized local feature sets extended with nonlocal features (denoted nonlocal).
Features	The feature sets are customized for each language.
Features	The exact definitions and feature sets that we use are available as part of the download package of our system.
Features	nonlocal features were selected with the same greedy forward strategy as the local features, starting from the optimized local feature sets .
Introducing Nonlocal Features	In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used.
Introduction	We show that for the task of coreference resolution the straightforward combination of beam search and early update (Collins and Roark, 2004) falls short of more limited feature sets that allow for exact search.
Results	the English development set as a function of number of training iterations with two different beam sizes, 20 and 100, over the local and nonlocal feature sets .
Results	The left half uses the local feature set, and the right the extended nonlocal feature set .
Results	Local vs. Nonlocal feature sets .

feature set is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

6. Which Are the Best Features for Automatic Verb Classification

Li, Jianguo and Brew, Chris

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment Setup 4.1 Corpus	We evaluate six different feature sets for their effectiveness in AVC: SCF, DR, CO, ACO, SCF+CO, and J OANISO7 .
Experiment Setup 4.1 Corpus	The other four feature sets include both syntactic and lexical information.
Experiment Setup 4.1 Corpus	JOANISO7: We use the feature set proposed in J oanis et al.
Introduction	We develop feature sets that combine syntactic and lexical information, which are in principle useful for any Levin-style verb classification.
Introduction	We test the general applicability and scalability of each feature set to the distinctions among 48 verb classes involving 1,300 verbs, which is, to our knowledge, the largest investigation on English verb classification by far.
Introduction	To preview our results, a feature set that combines both syntactic information and lexical information works much better than either of them used alone.
Machine Learning Method	We construct a semantic space with each feature set .
Machine Learning Method	Except for J ONAISO7 which only contains 224 features, all the other feature sets lead to a very high-dimensional space.
Related Work	The deeper linguistic analysis allows their feature set to cover a variety of indicators of verb semantics, beyond that of frame information.

feature set is mentioned in 37 sentences in this paper.

Topics mentioned in this paper:

7. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In order to evaluate the effectiveness of the cluster-based feature sets , we conducted dependency parsing experiments in English and Czech.
Experiments	In our English experiments, we tested eight different parsing configurations, representing all possible choices between baseline or cluster-based feature sets , first-order (Eisner, 2000) or second-order (Carreras, 2007) factorizations, and labeled or unlabeled parsing.
Experiments	Second, note that the parsers using cluster-based feature sets consistently outperform the models using the baseline features, regardless of model order or label usage.
Feature design	The feature sets we used are similar to other feature sets in the literature (McDonald et al., 2005a; Carreras, 2007), so we will not attempt to give a exhaustive description of the features in this section.
Feature design	In our experiments, we employed two different feature sets: a baseline feature set which draws upon “normal” information sources such as word forms and parts of speech, and a cluster-based feature set that also uses information derived from the Brown cluster hierarchy.
Feature design	Our first-order baseline feature set is similar to the feature set of McDonald et al.

feature set is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

8. Fast and Adaptive Online Training of Feature-Rich Translation Models

Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a fast and scalable online method for tuning statistical machine translation models with large feature sets .
Adaptive Online Algorithms	When we have a large feature set and therefore want to tune on a large data set, batch methods are infeasible.
Adaptive Online MT	For example, simple indicator features like lexicalized reordering classes are potentially useful yet bloat the the feature set and, in the worst case, can negatively impact
Experiments	To the dense features we add three high dimensional “sparse” feature sets .
Experiments	The primary baseline is the dense feature set tuned with MERT (Och, 2003).
Experiments	with the PT feature set .

feature set is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

9. Lexical Inference over Multi-Word Predicates: A Distributional Approach

Abend, Omri and Cohen, Shay B. and Steedman, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Instead, we use several baselines to demonstrate the usefulness of integrating multiple LCs, as well as the relative usefulness of our feature sets .
Experimental Setup	The other evaluated systems are formed by taking various subsets of our feature set .
Experimental Setup	We experiment with 4 feature sets .
Introduction	(2012) and compare our methods with analogous ones that select a fixed LC, using state-of-the-art feature sets .
Our Proposal: A Latent LC Approach	Section 3.1 describes our general approach, Section 3.2 presents our model and Section 3.3 details the feature set .
Our Proposal: A Latent LC Approach	We choose this model for its generality, conceptual simplicity, and because it allows to easily incorporate various feature sets and sets of latent variables.
Our Proposal: A Latent LC Approach	3.3 Feature Set

feature set is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

10. Omni-word Feature and Soft Constraint for Chinese Relation Extraction

Chen, Yanping and Zheng, Qinghua and Zhang, Wei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	In this section, we analyze the influences of employed feature sets and constraint conditions on the performances.
Discussion	Because features may interact mutually in an indirect way, even with the same feature set , different constraint conditions can have significant influences on the final performance.
Discussion	In Section 3, we introduced five candidate feature sets .
Feature Construction	3.1 Candidate Feature Set
Feature Construction	To sum up, among the five candidate feature sets , the position feature is used as a singleton feature.
Feature Construction	In the following experiments, focusing on Chinese relation extraction, we will analyze the performance of candidate feature sets and study the influence of the constraint conditions.

feature set is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

11. Learning Syntactic Verb Frames using Graphical Models

Lippincott, Thomas and Korhonen, Anna and Ó Séaghdha, Diarmuid

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methodology	In this section we describe the basic components of our study: feature sets , graphical model, inference, and evaluation.
Methodology	3.1 Input and feature sets
Methodology	We tested several feature sets either based on, or approximating, the concept of grammatical relation described in section 2.
Results	We evaluated SCF leXicons based on the eight feature sets described in section 3.1, as well as the VALEX SCF leXicon described in section 2.

feature set is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

12. Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

Habash, Nizar and Roth, Ryan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	5.1 Effect of Feature Set Choice
Results	Table 3 illustrates the result of taking a baseline feature set (containing word as the only feature) and adding a single feature from the Simple set to it.
Results	Feature Set F-score %Imp word 43.85 —word+nw 43.86 N0 word+na 44.78 2.1 word+lem 45.85 4.6 word+pos 45.91 4.7 word+nw+pos+lem+na 46.34 5.7

feature set is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

13. Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster

Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In both experiments we observed that the performance drops when excitation polarities and trouble expressions are removed from the feature set .
Experiments	PROPOSED-: The proposed method without the feature set denoted by “”.
Experiments	PROPOSED-z The proposed method without the feature set denoted by “”.
Problem Report and Aid Message Recognizers	The feature set given to the SVMs are summarized in the top part of Table 2.
Problem Report and Aid Message Recognizers	Note that we used a common feature set for both the problem report recognizer and aid message recognizer and that it is categorized into several types: features concerning trouble expressions (TR), excitation polarity (EX), their combination (TREXl) and word sentiment polarity (WSP), features expressing morphological and syntactic structures of nuclei and their context surrounding problem/aid nuclei (MSA), features concerning semantic word classes (SWC) appearing in nuclei and their context, request phrases, such as “Please help us”, appearing in tweets (REQ), and geographical locations in tweets recognized by our location recognizer (GL).
Problem Report and Aid Message Recognizers	We also attempted to represent nucleus template IDs, noun IDs and their combinations directly in our feature set to capture typical templates fre-
Problem-Aid Match Recognizer	Here also we attempted to capture typical or frequent matches of nuclei using template and noun IDs and their combinations, but we did not observe any improvement so we omit them from the feature set .
Problem-Aid Match Recognizer	The bottom part of Table 2 summarizes the additional feature set , some of which are described below in more detail.

feature set is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

14. The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation and Discussion	The SVMs achieve a similar cross-validated performance on all feature sets containing ngrams, showing only minor improvements for individual flaws when adding non-lexical features.
Evaluation and Discussion	Table 6 shows the performance of the SVMs with RBF kernel12 on each dataset using the NGRAM feature set .
Evaluation and Discussion	Classifiers using the N ONGRAM feature set achieved average F 1-scores below 0.50 on all datasets.
Experiments	We selected a subset of these features for our experiments and grouped them into four feature sets in order to determine how well different combinations of features perform in the task.
Experiments	Table 4: Feature sets used in the experiments

feature set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction

Nguyen, Thien Huu and Grishman, Ralph

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We evaluate word cluster and embedding (denoted by ED) features by adding them individually as well as simultaneously into the baseline feature set .
Experiments	This might be explained by the difference between our baseline feature set and the feature set underlying their kernel-based system.
Feature Set	5.1 Baseline Feature Set
Feature Set	(2011) utilize the full feature set from (Zhou et al., 2005) plus some additional features and achieve the state-of-the-art feature-based RE system.
Feature Set	Unfortunately, this feature set includes the human-annotated (gold-standard) information on entity and mention types which is often missing or noisy in reality (Plank and Moschitti, 2013).
Introduction	Recent research in this area, whether feature-based (Kambhatla, 2004; Boschee et al., 2005; Zhou et al., 2005; Grishman et al., 2005; Jiang and Zhai, 2007a; Chan and Roth, 2010; Sun et al., 2011) or kernel-based (Zelenko et al., 2003; Bunescu and Mooney, 2005a; Bunescu and Mooney, 2005b; Zhang et al., 2006; Qian et al., 2008; Nguyen et al., 2009), attempts to improve the RE performance by enriching the feature sets from multiple sentence analyses and knowledge resources.

feature set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

16. Discriminative state tracking for spoken dialog systems

Metallinou, Angeliki and Bohus, Dan and Williams, Jason

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generative state tracking	In DIS-CDYNl, we use the original feature set , ignoring the problem described above (so that the general features contribute no information), resulting in M + K weights.
Generative state tracking	The analysis of various feature sets indicates that the ASR/SLU error correlation (confusion) features yield the largest improvement — c.f.
Generative state tracking	feature set be compared to b in Table 3.
Introduction	portance of different feature sets for this task, and measure the amount of data required to reliably train our model.

feature set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

17. Online Learning in Tensor Space

Cao, Yuan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This regularizes the model complexity and makes the tensor model highly effective in situations where a large feature set is defined but very limited resources are available for training.
Conclusion and Future Work	This can be regarded as a form of model regular-ization.Therefore, compared with the traditional vector-space models, learning in the tensor space is very effective when a large feature set is defined, but only small amount of training data is available.
Introduction	This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets : the feature weights cannot be estimated reliably.
Introduction	Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set .
Tensor Model Construction	ways of mapping, which is an intractable number of possibilities even for modest sized feature sets , making it impractical to carry out a brute force search.
Tensor Space Representation	In general, if V features are defined for a learning problem, and we (i) organize the feature set as a tensor (I) E Rnlxn2x"'nD and (ii) use H component rank-l tensors to approximate the corresponding target weight tensor.
Tensor Space Representation	Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge.

feature set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

18. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1).
Experiments	We compare against the language-specific feature sets detailed in the literature on high-resource top-performing SRL systems: From Bj o'rkelund et al.
Experiments	(2009), these are feature sets for German, English, Spanish and Chinese, obtained by weeks of forward selection (Bdeflmemh); and from Zhao et al.

feature set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

19. Automatic detection of deception in child-produced speech using syntactic complexity features

Yancheva, Maria and Rudzicz, Frank

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and future work	l.00‘_ Feature Set ; —LIWC 3‘ ‘ —S ntactic «3 .90“.
Discussion and future work	Figure 3: Effect of feature set choice on cross-validation accuracy.
Discussion and future work	2012; Almela et al., 2012; Fomaciari and Poesio, 2012), our results suggest that the set of syntactic features presented here perform significantly better than the LIWC feature set on our data, and across seven out of the eight experiments based on age groups and verbosity of transcriptions.
Related Work	Descriptions of the data (section 3) and feature sets (section 4) precede experimental results (section 5) and the concluding discussion (section 6).

feature set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

20. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data.
Discussion	In future work, we would like to investigate more sophisticated features, better learners, and in general improve the components of our system that have been neglected in the current investigation of relative improvements by scaling the size of data and feature sets .
Experiments	The results on the news-commentary (nc) data show that training on the development set does not benefit from adding large feature sets — BLEU result differences between tuning 12 default features
Experiments	Here tuning large feature sets on the respective dev sets yields significant improvements of around 2 BLEU points over tuning the 12 default features on the dev sets.
Introduction	Our resulting models are learned on large data sets, but they are small and outperform models that tune feature sets of various sizes on small development sets.
Related Work	All approaches have been shown to scale to large feature sets and all include some kind of regularization method.

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. Learning to Rank Answers on Large Online QA Collections

Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	To answer the second research objective we will analyze the contribution of the proposed feature set to this function.
Approach	For completeness we also include in the feature set the value of the t f - idf similarity measure.
Experiments	Feature Set MRR P@1
Experiments	The algorithm incrementally adds to the feature set the feature that provides the highest MRR improvement in the development partition.
Related Work	This approach allowed us to perform a systematic feature analysis on a large-scale real-world corpus and a comprehensive feature set .
Related Work	Our model uses a larger feature set that includes correlation and transformation-based features and five different content representations.

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

22. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with handcrafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set .
Introduction	After concatenating the SSWE feature with existing feature set , we push the state-of-the-art to 86.58% in macro-Fl.
Related Work	NRC-ngram refers to the feature set of NRC leaving out ngram features.
Related Work	After concatenating SSWEU with the feature set of NRC, the performance is further improved to 86.58%.
Related Work	The concatenated features SSWEu +NRC-ngram (86.48%) outperform the original feature set of NRC (84.73%).

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

23. Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition

Oh, Jong-Hoon and Uchimoto, Kiyotaka and Torisawa, Kentaro

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Acquisition of Hyponymy Relations from Wikipedia	(2008) but LFl—LF5 and SFl-SFQ are the same as their feature set .
Acquisition of Hyponymy Relations from Wikipedia	Let us provide an overview of the feature sets used in Sumida et al.
Acquisition of Hyponymy Relations from Wikipedia	These are the feature sets used in Sumida et al.
Motivation	Since the learning settings ( feature sets , feature values, training data, corpora, and so on) are usually different in two languages, the reliable part in one language may be overlapped by an unreliable part in another language.

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

24. Intensional Summaries as Cooperative Responses in Dialogue: Automation and Evaluation

Polifroni, Joseph and Walker, Marilyn

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment Two	We derive two types of feature sets from the responses: features derived from each user model and features derived from attributes of the query/ response pair itself.
Experiment Two	The five feature sets for the user model are:
Experiment Two	0 allUtz’lz’ty: 12 features consisting of the high, low, and average utility scores from the previous three feature sets .

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

feature sets (6)

25. Predicting Power Relations between Participants in Written Dialog from a Single Thread

Prabhakaran, Vinodkumar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Using this feature set , we obtain an accuracy of 73.0% on a blind test.
Introduction	This best-performing system uses our new feature set .
Predicting Direction of Power	We use another feature set LEX to capture word ngrams, POS (part of speech) ngrams and mixed ngrams.
Predicting Direction of Power	We also performed an ablation study to understand the importance of different slices of our feature sets .
Structural Analysis	THRPR: This feature set includes two meta-
Structural Analysis	data based feature sets — positional and verbosity.

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

26. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Second, note that the parsers incorporating the N-gram feature sets consistently outperform the models using the baseline features in all test data sets, regardless of model order or label usage.
Web-Derived Selectional Preference Features	In this paper, we employ two different feature sets: a baseline feature set3 which draw upon “normal” information source, such as word forms and part-of-speech (POS) without including the web-derived selectional preference4 features, a feature set conjoins the baseline features and the web-derived selectional preference features.
Web-Derived Selectional Preference Features	3 This kind of feature sets are similar to other feature sets in the literature (McDonald et al., 2005; Carreras, 2007), so we will not attempt to give a exhaustive description.
Web-Derived Selectional Preference Features	For example, the baseline feature set includes indicators for word-to-word and tag-to-tag interactions between the head and modifier of a dependency.

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

27. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
Abstract	To illustrate, consider the following feature set , a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract	While the feature set was too small to produce notable results, we identified which features actually were indicative of lect.

feature set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

28. Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Kozareva, Zornitsa

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We have conducted exhaustive evaluation with multiple machine learning classifiers and different features sets spanning from lexical information to psychological categories developed by (Tausczik and Pennebaker, 2010).
Task A: Polarity Classification	We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams.
Task A: Polarity Classification	For each information source (metaphor, context, source, target and their combinations), we built a separate n-gram feature set and model, which was evaluated on 10-fold cross validation.
Task A: Polarity Classification	We have used different feature sets and information sources to solve the task.
Task B: Valence Prediction	We have studied different feature sets and information sources to solve the task.

feature set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

29. Joint Inference of Named Entity Recognition and Normalization for Tweets

Liu, Xiaohua and Zhou, Ming and Zhou, Xiangyang and Fu, Zhongyang and Wei, Furu

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 4: Overall F1 (%) of NER and Accuracy (%) of N EN with different feature sets .
Experiments	Table 4 shows the overall performance of our method with various feature set combinations, where F0, F; and F9 denote the orthographic features, the lexical features, and the gazetteer-related features, respectively.
Our Method	(2) {$21521 and {$225521 are two feature sets .
Our Method	4.3.1 Feature Set One: {$21) 5:11
Our Method	4.3.2 Feature Set Two: {$22) 5:21

feature set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

30. Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Pitler, Emily and Louis, Annie and Nenkova, Ani

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results and discussion	For all four other questions, the best feature set is Continuity, which is a combination of summarization specific features, coreference features and cosine similarity of adjacent sentences.
Results and discussion	Feature set Gram.
Results and discussion	Feature set Gram.

feature set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

31. A Discriminative Latent Variable Model for Statistical Machine Translation

Blunsom, Phil and Cohn, Trevor and Osborne, Miles

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Challenges for Discriminative SMT	This problem of over-fitting is exacerbated in discriminative models with large, expressive, feature sets .
Challenges for Discriminative SMT	Learning with a large feature set requires many training examples and typically many iterations of a solver during training.
Evaluation	To do this we use our own implementation of Hiero (Chiang, 2007), with the same grammar but with the traditional generative feature set trained in a linear model with minimum BLEU training.
Evaluation	The feature set includes: a trigram language model (lm) trained
Evaluation	The relative scores confirm that our model, with its minimalist feature set, achieves comparable performance to the standard feature set without the language model.

feature set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

32. Hypertagging: Supertagging for Surface Realization with CCG

Espinosa, Dominic and White, Michael and Mehay, Dennis

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Finally, further efforts to engineer a grammar suitable for realization from the CCGbank should provide richer feature sets , which, as our feature ablation study suggests, are useful for boosting hypertagging performance, hence for finding better and more complete realizations.
Results and Discussion	The the whole feature set was found in feature ablation testing on the development set to outperform all other feature subsets significantly (p < 2.2 - 10—16).
Results and Discussion	The full feature set outperforms all others significantly (p < 2.2 - 10—16).
Results and Discussion	The results for the full feature set on Sections ()0 and 23 are outlined in Table 2.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

logical form (13)
POS tags (13)
CCG (11)

33. Metaphor Detection with Cross-Lingual Model Transfer

Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Experimental results are given in Table 2, where we also provide the number of features in each feature set .
Experiments	Figure 1: ROC curves for classifiers trained using different feature sets (English SVO and AN test sets).
Experiments	According to ROC plots in Figure 1, all three feature sets are effective, both for SVO and for AN tasks.
Related Work	Current work builds on this study, and incorporates new syntactic relations as metaphor candidates, adds several new feature sets and different, more reliable datasets for evaluating results.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

34. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Syllabification Experiments	In this section, we will discuss the results of our best emission feature set (five-gram features with a context window of eleven letters) on held-out unseen test sets.
Syllabification with Structured SVMs	With SVM-HMM, the crux of the task is to create a tag scheme and feature set that produce good results.
Syllabification with Structured SVMs	After experimenting with the development set, we decided to include in our feature set a window of eleven characters around the focus character, five on either side.
Syllabification with Structured SVMs	As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set .

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

35. You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement

Elsner, Micha and Charniak, Eugene

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Future Work	We are also interested to see how well this feature set performs on speech data, as in (Aoki et al., 2003).
Related Work	They motivate a richer feature set , which, however, does not yet appear to be implemented.
Related Work	(2005) adds word repetition to their feature set .
Related Work	Our feature set incorporates information which has proven useful in meeting segmentation (Galley et al., 2003) and the task of detecting addressees of a specific utterance in a meeting (J ovanovic et al., 2006).

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

feature set (4)
human annotations (4)

36. Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Maximum Entropy Based Model for Hindi NER	In Table 2 we have shown the accuracy values for few feature sets .
Maximum Entropy Based Model for Hindi NER	Again when wi_2 and rut-+2 are deducted from the feature set (i.e.
Maximum Entropy Based Model for Hindi NER	When suffix, prefix and digit information are added to the feature set , the f-value is increased upto 74.26.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

37. Why-Question Answering using Intra- and Inter-Sentential Causal Relations

Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Causal Relations for Why-QA	We used the three types of feature sets in Table 3 for training the CRFs, where j is in the range of z' — 4 g j g i + 4 for current position i in a causal relation candidate.
Causal Relations for Why-QA	More detailed information concerning the configurations of all the nouns in all the candidates of an appropriate causal relation (including their cause parts) and the question are encoded into our feature set 6 f1—e f4 in Table 4 and the final judgment is done by our re-ranker.
Experiments	We evaluated the performance when we removed one of the three types of features (ALL-“MORPH”, ALL-“SYNTACTIC” and ALL-“C-MARKER”) and compared the results in these settings with the one when all the feature sets were used (ALL).
Experiments	We confirmed that all the feature sets improved the performance, and we got the best performance when using all of them.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

38. Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

Wang, Aobo and Kan, Min-Yen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	To compare our joint inference versus other learning models, we also employed a decision tree (DT) learner, equipped with the same feature set as our FCRF.
Experiment	Both models take the whole feature set described in Section 2.3.
Experiment	3.4.3 Feature set evaluation

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

SVM (13)
baseline systems (11)
CRF (11)

39. Multilingual Harvesting of Cross-Cultural Stereotypes

Veale, Tony and Hao, Yanfen and Li, Guofu

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Evaluation: Simile-derived Representations	Suspecting that a noisy feature set had contributed to the apparent drop in performance, these authors then proceed to apply a variety of noise filters to reduce the set of feature values to 51,345, which in turn leads to an improved cluster purity measure of 62.7%.
Empirical Evaluation: Simile-derived Representations	In experiment 2, we see a similar ratio of feature quantities before filtering; after some initial filtering, Almuhareb and Poesio reduce their feature set to just under 10 times the size of the simile-derived feature set .
Empirical Evaluation: Simile-derived Representations	First, the feature representations do not need to be hand-filtered and noise-free to be effective; we see from the above results that the raw values extracted from the simile pattern prove slightly more effective than filtered feature sets used by Almuhareb and Poesio.
Related Work	As noted by the latter authors, this results in a much smaller yet more diagnostic feature set for each concept.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

40. Sentiment Relevance

Scheible, Christian and Schütze, Hinrich

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distant Supervision	However, we did not find a cumulative effect (line 8) of the two feature sets .
Features	We refer to these feature sets as CoreLex (CX) and VerbNet (VN) features and to their combination as semantic features (SEM).
Features	feature set is referred to as named entities (NE).
Features	We refer to this feature set as sequential features (SQ).

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria

Druck, Gregory and Mann, Gideon and McCallum, Andrew

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Comparison with Unsupervised Learning	With this feature set , the CRF model is less expressive than DMV.
Experimental Comparison with Unsupervised Learning	The CRF cannot consider valency even with the full feature set , but this is balanced by the ability to use distance.
Experimental Comparison with Unsupervised Learning	First we note that GE training using the full feature set substantially outperforms the restricted feature set , despite the fact that the same set of constraints is used for both experiments.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. Robust Machine Translation Evaluation with Entailment Features

Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Outlook	Conceputalizing MT evaluation as an entailment problem motivates the use of a rich feature set that covers, unlike almost all earlier metrics, a wide range of linguistic levels, including lexical, syntactic, and compositional phenomena.
Expt. 2: Predicting Pairwise Preferences	Feature set Consis- System-level tency (%) correlation (p)
Introduction	(2005)), and thus predict the quality of MT hypotheses with a rich RTE feature set .
Regression-based MT Quality Prediction	(2007) train binary classifiers on a feature set formed by a number of MT metrics.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

43. Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Second, although the feature set is fundamentally a combination of those used in previous works (Zhang and Clark, 2010; Huang and Sagae, 2010), to integrate them in a single incremental framework is not straightforward.
Model	The feature set of our model is fundamentally a combination of the features used in the state-of-the-art joint segmentation and POS tagging model (Zhang and Clark, 2010) and dependency parser (Huang and Sagae, 2010), both of which are used as baseline models in our experiment.
Model	All of the models described above except Dep’ are based on the same feature sets for segmentation and
Related Works	Zhang and Clark (2008) proposed an incremental joint segmentation and POS tagging model, with an effective feature set for Chinese.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

44. Classifying French Verbs Using French and English Lexical Resources

Falk, Ingrid and Gardent, Claire and Lamirel, Jean-Charles

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering Methods, Evaluation Metrics and Experimental Setup	Table 1: Sample output for a cluster produced with the grid-scf-sem feature set and the IGNGF clustering method.
Features and Data	Table 4(a) includes the evaluation results for all the feature sets when using IGNGF clustering.
Features and Data	In terms of features, the best results are obtained using the grid-scf-sem feature set with an F-measure of 0.70.
Features and Data	In contrast, the classification obtained using the scf-synt-sem feature set has a higher CMP for the clustering with optimal mPUR (0.57); but a lower F—measure (0.61), a larger number of classes (16)

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

45. A New Dataset and Method for Automatically Grading ESOL Texts

Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	3.2 Feature set
Approach	Our full feature set is as follows:
Conclusions and future work	The addition of an incoherence metric to the feature set of an AA system has been shown to improve performance significantly (Miltsakaki and Kukich, 2000; Miltsakaki and Kukich, 2004).
Validity tests	Although the above modifications do not exhaust the potential challenges a deployed AA system might face, they represent a threat to the validity of our system since we are using a highly related feature set .

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

46. Word Segmentation of Informal Arabic with Domain Adaptation

Monroe, Will and Green, Spence and Manning, Christopher D.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Arabic Word Segmentation Model	This feature set also allows the model to take into account other interactions between the beginning and end of a word, particularly those involving the definite article Jl al-.
Arabic Word Segmentation Model	A notable property of this feature set is that it remains highly dialect-agnostic, even though our additional features were chosen in response to errors made on text in Egyptian dialect.
Error Analysis	0 errors that can be fixed with a fuller analysis of just the problematic token, and therefore represent a deficiency in the feature set ; and
Error Analysis	In 36 of the 100 sampled errors, we conjecture that the presence of the error indicates a shortcoming of the feature set , resulting in segmentations that make sense locally but are not plausible given the full token.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

47. Capturing Salience with a Trainable Cache Model for Zero-anaphora Resolution

Iida, Ryu and Inui, Kentaro and Matsumoto, Yuji

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Machine learning-based cache model	Therefore, the intra-sentential and inter-sentential zero-anaphora resolution models are separately trained by exploiting different feature sets as shown in Table 2.
Machine learning-based cache model	Table 1: Feature set used in the cache models
Machine learning-based cache model	The feature set used in the cache model is shown in Table l. The ‘CASE_MARKER’ feature roughly captures the salience of the local transition dealt with in Centering Theory, and is also intended to capture the global foci of a text coupled with the BEGINNING feature.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

feature set (4)
rule-based (4)

48. Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

Huang, Ruihong and Riloff, Ellen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	But, importantly, our classifiers all use the same feature set so they do not represent independent views of the data.
Related Work	The feature set for these classifiers is exactly the same as described in Section 3.2, except that we add a new lexical feature that represents the head noun of the target NP (i.e., the NP that needs to be tagged).
Related Work	3But technically this is not co-training because our feature sets are all the same.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

49. Towards Open-Domain Semantic Role Labeling

Croce, Danilo and Giannone, Cristina and Annesi, Paolo and Basili, Roberto

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Analysis	Results on the Boundary Detection BD task are obtained by training an SVM model on the same feature set presented in (J ohansson and Nugues, 2008b) and are slightly below the state-of-the art BD accuracy reported in (Coppola et al., 2009).
Empirical Analysis	Given the relatively simple feature set adopted here, this result is very significant as for its resulting efficiency.
Introduction	Notice how this is also a general problem of statistical learning processes, as large fine grain feature sets are more exposed to the risks of overfitting.
Related Work	While these approaches increase the expressive power of the models to capture more general linguistic properties, they rely on complex feature sets , are more demanding about the amount of training information and increase the overall exposure to overfitting effects.

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

50. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parsing: Baseline	With notations defined in Table l, a feature set as shown in Table 2 is adopted.
Dependency Parsing: Baseline	We used a large scale feature selection approach as in (Zhao et al., 2009) to obtain the feature set in Table 2.
Evaluation Results	The results with different feature sets are in Table 4.
Evaluation Results	Table 4: The results with different feature sets features with p without p

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

51. Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar

Zhang, Yi and Wang, Rui

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parsing with HPSG	Therefore, we extend this feature set by adding four more feature categories, which are similar to the original ones, but the dependency relation was replaced by the dependency backbone of the HP S G outputs.
Dependency Parsing with HPSG	The extended feature set is shown in Table 1.
Dependency Parsing with HPSG	The extended feature set is shown in Table 2 (the new features are listed separately).

feature set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

52. Adaptive Quality Estimation for Machine Translation

Turchi, Marco and Anastasopoulos, Antonios and C. de Souza, José G. and Negri, Matteo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation framework	4.2 Performance indicator and feature set
Evaluation framework	As our focus is on the algorithmic aspect, in all experiments we use the same feature set , which consists of the seventeen features proposed in (Specia et al., 2009).
Evaluation framework	This feature set , fully described in (Callison-Burch et al., 2012), takes into account the complexity of the source sentence (e. g. number of tokens, number of translations per source word) and the fluency of the target translation (e. g. language model probabilities).

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

53. Less Grammar, More Features

Hall, David and Durrett, Greg and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Annotations	Table 2 shows the performance of our feature set in grammars with several different levels of structural annotation.3 Klein and Manning (2003) find large gains (6% absolute improvement, 20% relative improvement) going from v = 0, h = 0 to v = l, h = 1; however, we do not find the same level of benefit.
Features	Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set.
Introduction	Our parser can be easily adapted to this task by replacing the X-bar grammar over treebank symbols with a grammar over the sentiment values to encode the output variables and then adding n-gram indicators to our feature set to capture the bulk of the lexical effects.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. A Semiparametric Gaussian Copula Regression Model for Predicting Financial Risks from Earnings Calls

Wang, William Yang and Hua, Zhenhao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Copula Models for Text Regression	This nice property essentially allows us to fuse distinctive lexical, syntactic, and semantic feature sets naturally into a single compact model.
Experiments	Feature sets:
Experiments	To do this, we sample equal amount of features from each feature set , and concatenate

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features

Wang, Zhiguo and Xue, Nianwen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features.
Experiment	We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features.
Related Work	Finally, we enhanced our parsing model by enlarging the feature set with nonlocal features and semi-supervised word cluster features.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. Learning to lemmatise Polish noun phrases

Radziszewski, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

CRF and features	The work describes a feature set proposed for this task, which includes word forms in a local window, values of grammatical class, gender, number and case, tests for agreement on number, gender and case, as well as simple tests for letter case.
CRF and features	We took this feature set as a starting point.
CRF and features	The final feature set includes the following

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. Discovering User Interactions in Ideological Discussions

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Evaluation	To compare classification performance, we use two feature sets : (i) standard word + POS 1-4 grams and (ii) AD-expressions from $5.
Empirical Evaluation	Predicting agreeing arguing nature is harder than that of disagreeing across all feature settings .
Empirical Evaluation	Using the discovered AD-eXpressions (Table 6, last low) as features renders a statistically significant (see Table 6 caption) improvement over other baseline feature settings .

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. Identification of Speakers in Novels

He, Hua and Barbosa, Denilson and Kondrak, Grzegorz

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	The principal feature sets are listed in Table 2, together with an indication whether they are novel or have been used in previous work.
Speaker Identification	Table 2: Principal feature sets .
Speaker Identification	subsequently we add three more feature sets that represent the following neighboring utterances: n — 2, n — l and n + l. Informally, the features of the utterances n — l and n + l encode the first observation, while the features representing the utterance n — 2 encode the second observation.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

59. Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

Darwish, Kareem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	(2007) used a maximum entropy classifier trained on a feature set that includes the use of gazetteers and a stop-word list, appearance of a NE in the training set, leading and trailing word bigrams, and the tag of the previous word.
Related Work	(2008), they examined the same feature set on the Automatic Content Extraction (ACE) datasets using CRF
Related Work	Abdul-Hamid and Darwish (2010) used a simplified feature set that relied primarily on character level features, namely leading and trailing letters in a word.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

60. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Models labeled X/Y use learning algorithm X and feature set Y.
Experiments	The feature set DP+ contains TF—IDF, DP alignment, dictionary, and length features.
Experiments	The results on the test fold are shown in Figure l, which compares the learning algorithms, and Figure 2, which compares feature sets .

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

61. Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

Mehdad, Yashar and Negri, Matteo and Federico, Marcello

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Beyond lexical CLTE	builds on two additional feature sets , derived from i) semantic phrase tables, and ii) dependency relations.
Experiments and results	(a) In both settings all the feature sets used outperform the approaches taken as terms of comparison.
Experiments and results	As shown in Table 1, the combined feature set (PT+SPT+DR) significantly5 outperforms the leXical model (64.5% vs 62.6%), while SPT and DR features separately added to PT (PT+SPT, and PT+DR) lead to marginal improvements over the results achieved by the PT model alone (about 1%).

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

62. Text-level Discourse Parsing with Rich Linguistic Features

Feng, Vanessa Wei and Hirst, Graeme

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	us to compare the model-fitting capacity of different feature sets from another perspective, especially when the training data is not sufficiently well fitted by the model.
Method	We refine Hernault et al.’s original feature set by incorporating our own features as well as some adapted from Lin et al.
Method	(2009) also incorporated contextual features in their feature set .

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

63. Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Chen, Wenliang and Zhang, Min and Li, Haizhou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding	With the rich feature set in Table l, the running time of Intersect is longer than the time of Rescoring.
Experiments	Table 2 shows the feature settings of the systems, where MSTl/2 refers to the basic first—/second-order parser and MSTB l/2 refers to the enhanced first-/second-order parser.
Experiments	MSTBl and MSTB2 used the same feature setting , but used different order models.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

64. Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Approaches to Deceptive Opinion Spam Detection	Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set .
Automated Approaches to Deceptive Opinion Spam Detection	We consider all three n-gram feature sets , namely UNIGRAMS, BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Automated Approaches to Deceptive Opinion Spam Detection	We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS, BIGRAMS+, and TRIGRAMS+.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

65. Semantic Representation of Negation Using Focus Detection

Blanco, Eduardo and Moldovan, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning Algorithm	The held-out portion is used to tune the feature set and results are reported for the test split only, i.e., using unseen instances.
Learning Algorithm	We improve BASIC with an extended feature set which targets especially A1 and the verb (Table 5).
Negation in Natural Language	The main contributions are: (l) interpretation of negation using focus detection; (2) focus of negation annotation over all PropBank negated sen-tencesl; (3) feature set to detect the focus of negation; and (4) model to semantically represent negation and reveal its underlying positive meaning.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

66. Distant supervision for relation extraction without labeled data

Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	The held-out results in Figure 2 suggest that the combination of syntactic and lexical features provides better performance than either feature set on its own.
Evaluation	At most recall levels, the combination of syntactic and lexical features offers a substantial improvement in precision over either of these feature sets on its own.
Evaluation	No feature set strongly outperforms any of the others across all relations.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

67. Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering

Huang, Jian and Taylor, Sarah M. and Smith, Jonathan L. and Fotiadis, Konstantinos A. and Giles, C. Lee

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	Future research directions include developing rich feature sets and using corpus level or external information.
Experiments	Since different feature sets , NLP tools, etc are used in different benchmarked systems, we are also interested in comparing the proposed algorithm with different soft relational clustering variants.
Experiments	With the same feature sets and distance function, KARC-S outperforms FRC in F score by about 5%.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

68. Modeling Latent Biographic Attributes in Conversational Genres

Garera, Nikesh and Yarowsky, David

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpus Details	However, stopwords were retained in the feature set as various sociolinguistic studies have shown that use of some of the stopwords, for instance, pronouns and determin-ers, are correlated with age and gender.
Corpus Details	Also, only the ngrams with frequency greater than 5 were retained in the feature set following Boulis and Ostendorf (2005).
Related Work	Another relevant line of work has been on the blog domain, using a bag of words feature set to discriminate age and gender (Schler et al., 2006; Burger and Henderson, 2006; Nowson and Oberlander, 2006).

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

69. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency parsing for machine translation	The three feature sets that were used in our experiments are shown in Table 2.
Dependency parsing for machine translation	It is quite similar to the McDonald (2005a) feature set , except that it does not include the set of all POS tags that appear between each candidate head-modifier pair (i , j).
Dependency parsing for machine translation	The primary difference between our feature sets and the ones of McDonald et a1.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

70. Forest Reranking: Discriminative Parsing with Non-Local Features

Huang, Liang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Our feature set is summarized in Table 2, which closely follows Chamiak and Johnson (2005), except that we excluded the nonlocal features Edges, NGram, and CoPar, and simplified Rule and NGramTree features, since they were too complicated to compute.4 We also added four unlexicalized local features from Collins (2000) to cope with data-sparsity.
Experiments	tures in the updated version.5 However, our initial experiments show that, even with this much simpler feature set , our 50-best reranker performed equally well as theirs (both with an F-score of 91.4, see Tables 3 and 4).
Experiments	This result confirms that our feature set design is appropriate, and the averaged perceptron learner is a reasonable candidate for reranking.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

71. Learning Bilingual Lexicons from Monolingual Corpora

Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Table 1: Performance of EDITDIST and our model with various features sets on EN -ES-W. See section 5.
Experimental Setup	We will use MCCA (for matching CCA) to denote our model using the optimal feature set (see section 5.3).
Introduction	As an example of the performance of the system, in English-Spanish induction with our best feature set , using corpora derived from topically similar but nonparallel sources, the system obtains 89.0% precision at 33% recall.

feature set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: