Index of papers in Proc. ACL that mention

feature vector

Seen in text as:

feature vector (194)
feature vectors (132)
Feature Vector (4)

Seen in 319 sentences in 60 papers.

1. Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The created thesaurus is then used to expand feature vectors to train a binary classifier.
Introduction	a unigram or a bigram of word lemma) in a review using a feature vector .
Introduction	We model the cross-domain sentiment classification problem as one of feature expansion, where we append additional related features to feature vectors that represent source and target domain reviews in order to reduce the mismatch of features between the two domains.
Introduction	thesaurus to expand feature vectors in a binary classifier at train and test times by introducing related lexical elements from the thesaurus.
Sentiment Sensitive Thesaurus	For example, if we know that both excellent and delicious are positive sentiment words, then we can use this knowledge to expand a feature vector that contains the word delicious using the word excellent, thereby reducing the mismatch between features in a test instance and a trained model.
Sentiment Sensitive Thesaurus	Let us denote the value of a feature 21) in the feature vector u representing a lexical element u by f (u, The vector u can be seen as a compact representation of the distribution of a lexical element u over the set of features that co-occur with u in the reviews.
Sentiment Sensitive Thesaurus	From the construction of the feature vector u described in the previous paragraph, it follows that 21) can be either a sentiment feature or another lexical element that co-occurs with u in some review sentence.

feature vector is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

2. Query Weighting for Ranking Model Adaptation

Cai, Peng and Gao, Wei and Zhou, Aoying and Wong, Kam-Fai

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector .
Evaluation	Specifically, after document feature aggregation, the number of query feature vectors in all adaptation tasks is no more than 150 in source and target domains.
Introduction	Take Figure 2 as a toy example, where the document instance is represented as a feature vector with four features.
Introduction	In this work, we present two simple but very effective approaches attempting to resolve the problem from distinct perspectives: (1) we compress each query into a query feature vector by aggregating all of its document instances, and then conduct query weighting on these query feature vectors ; (2) we measure the similarity between the source query and each target query one by one, and then combine these fine- grained similarity values to calculate its importance to the target domain.
Query Weighting	The query can be compressed into a query feature vector , where each feature value is obtained by the aggregate of its corresponding features of all documents in the query.
Query Weighting	We concatenate two types of aggregates to construct the query feature vector : the mean [i = fi Zlqzll
Query Weighting	is the feature vector of document 2' and \|q\| denotes the number of documents in q .

feature vector is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. Low-Rank Tensors for Scoring Dependency Structures

Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we use tensors to map high-dimensional feature vectors into low dimensional representations.
Introduction	The exploding dimensionality of rich feature vectors must then be balanced with the difficulty of effectively learning the associated parameters from limited training data.
Introduction	We depart from this view and leverage high-dimensional feature vectors by mapping them into low dimensional representations.
Introduction	We begin by representing high-dimensional feature vectors as multi-way cross-products of smaller feature vectors that represent words and their syntactic relations (arcs).
Problem Formulation	We can alternatively specify arc features in terms of rank-l tensors by taking the Kronecker product of simpler feature vectors associated with the head (vector gbh E R”), and modifier (vector gbm E R”), as well as the arc itself (vector ohm E Rd).
Problem Formulation	Here ohm is much lower dimensional than the MST arc feature vector gbhmm discussed earlier.
Problem Formulation	By taking the crossproduct of all these component feature vectors , we obtain the full feature representation for arc h —> m as a rank-l tensor

feature vector is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

4. Phrase Clustering for Discriminative Learning

Lin, Dekang and Wu, Xiaoyun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributed K-Means clustering	Given a set of elements represented as feature vectors and a number, k, of desired clusters, the K-Means algorithm consists of the following steps:
Distributed K-Means clustering	Before describing our parallel implementation of the K-Means algorithm, we first describe the phrases to be clusters and how their feature vectors are constructed.
Distributed K-Means clustering	Following previous approaches to distributional clustering of words, we represent the contexts of a phrase as a feature vector .

feature vector is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

5. Grounded Language Learning from Video Described with Sentences

Yu, Haonan and Siskind, Jeffrey Mark

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Detailed Problem Formulation	We use discrete features, namely natural numbers, in our feature vectors , quantized by a binning process.
Detailed Problem Formulation	The length of the feature vector may vary across parts of speech.
Detailed Problem Formulation	Let NC denote the length of the feature vector for part of speech c, 3073; denote the time-series (mil, .
Introduction	We associate a feature vector with each frame (detection) of each such track.
Introduction	This feature vector can encode image features (including the identity of the particular detector that produced that detection) that correlate with object class; region color, shape, and size features that correlate with object properties; and motion features, such as linear and angular object position, velocity, and acceleration, that correlate with event properties.
Introduction	involves computing the associated feature vector for that HMM over the detections in the tracks chosen to fill its arguments.
The Sentence Tracker	,qT) denote the sequence of states qt that leads to an observed track, B (D75, jt,qt,)\) denote the conditional log probability of observing the feature vector associated with the detection selected by jt among the detections D75 in frame t, given that the HMM is in state qt, and A(qt_1, qt, A) denote the log transition probability of the HMM.
The Sentence Tracker	We further need to generalize F so that it computes the joint score of a sequence of detections, one for each track, G so that it computes the joint measure of coherence between a sequence of pairs of detections in two adjacent frames, and B so that it computes the joint conditional log probability of observing the feature vectors associated with the sequence of detections selected by jt.
The Sentence Tracker	We further need to generalize B so that it computes the joint conditional log probability of observing the feature vectors for the detections in the tracks that are assigned to the arguments of the HMM for each word in the sentence and A so that it computes the joint log transition probability for the HMMs for all words in the sentence.

feature vector is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

6. A Nonparametric Bayesian Approach to Acoustic Model Discovery

Lee, Chia-ying and Glass, James

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	For each of the remaining feature vectors in
Inference	Mixture ID (mt) For each feature vector in a segment, given the cluster label CM and the hidden state index st, the derivation of the conditional posterior probability of its mixture ID is straightforward:
Inference	where mm is the set of mixture IDs of feature vectors that belong to state 8 of HMM c. The mth entry of fl’ is fl + themcqs 6(mt, m), where we use
Model	Given the cluster label, choose a hidden state for each feature vector :13; in the segment.
Model	Use the chosen Gaussian mixture to generate the observed feature vector
Model	2, where the shaded circle denotes the observed feature vectors , and the squares denote the hyperparameters of the priors used in our model.
Problem Formulation	1 illustrates how the speech signal of a single word utterance banana is converted to a sequence of feature vectors to £13711.
Problem Formulation	Segment (p;- k) We define a segment to be composed of feature vectors between two boundary frames.
Problem Formulation	Hidden State (8%) Since we assume the observed data are generated by HMMs, each feature vector , 30%, has an associated hidden state index.

feature vector is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

7. Directional Distributional Similarity for Lexical Expansion

Kotlerman, Lili and Dagan, Ido and Szpektor, Idan and Zhitomirsky-Geffet, Maayan

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Statistical Inclusion Measure	Amongst these features, those found in 21’s feature vector are termed included features.
A Statistical Inclusion Measure	In preliminary data analysis of pairs of feature vectors , which correspond to a known set of valid and invalid expansions, we identified the following desired properties for a distributional inclusion measure.
A Statistical Inclusion Measure	In our case the feature vector of the expanded word is analogous to the set of all relevant documents while tested features correspond to retrieved documents.
Background	First, a feature vector is constructed for each word by collecting context words as features.
Background	where FVgc is the feature vector of a word cc and way (f) is the weight of the feature f in that word’s vector, set to their pointwise mutual information.
Background	Extending this rationale to the textual entailment setting, Geffet and Dagan (2005) expected that if the meaning of a word it entails that of 2) then all its prominent context features (under a certain notion of “prominence”) would be included in the feature vector of v as well.
Conclusions and Future work	This paper advocates the use of directional similarity measures for lexical expansion, and potentially for other tasks, based on distributional inclusion of feature vectors .
Evaluation and Results	Feature vectors were created by parsing the Reuters RCVl corpus and taking the words related to each term through a dependency relation as its features (coupled with the relation name and direction, as in (Lin, 1998)).

feature vector is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

8. Additive Neural Networks for Statistical Machine Translation

liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector .
Introduction	We also integrate word embedding into the model by representing each word as a feature vector (Collobert and Weston, 2008).
Introduction	.. ,hK(f,e,d))T is a K -dimensional feature vector defined on the tuple (f,e,d); W = (w1,w2,--- ,wK)T is a K-dimensional weight vector of h, i.e., the parameters of the model, and it can be tuned by the toolkit MERT (Och, 2003).
Introduction	(3) as a function of a feature vector h, i.e.

feature vector is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

9. Learning Bilingual Lexicons from Monolingual Corpora

Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual Lexicon Induction	Then, for each matched pair of word types (2', j ) E m, we need to generate the observed feature vectors of the source and target word types, fs(si) 6 Rd5 and fT(tj) E RdT.
Bilingual Lexicon Induction	The feature vector of each word type is computed from the appropriate monolingual corpus and summarizes the word’s monolingual characteristics; see section 5 for details and figure 2 for an illustration.
Bilingual Lexicon Induction	Specifically, to generate the feature vectors , we first generate a random concept 2M N N(0, Id), where Id is the d x d identity matrix.
Features	For a concrete example of a word type to feature vector mapping, see figure 2.
Inference	4Since ds and dT can be quite large in practice and often greater than lml, we use Cholesky decomposition to re-represent the feature vectors as lml-dimensional vectors with the same dot products, which is all that CCA depends on.
Introduction	In our method, we represent each language as a monolingual lexicon (see figure 2): a list of word types characterized by monolingual feature vectors , such as context counts, orthographic substrings, and so on (section 5).

feature vector is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

10. Opinion Mining on YouTube

Severyn, Aliaksei and Moschitti, Alessandro and Uryupina, Olga and Plank, Barbara and Filippova, Katja

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions and Future Work	We use standard feature vectors augmented by shallow syntactic trees enriched with additional conceptual information.
Conclusions and Future Work	This paper makes several contributions: (i) it shows that effective OM can be carried out with supervised models trained on high quality annotations; (ii) it introduces a novel annotated corpus of YouTube comments, which we make available for the research community; (iii) it defines novel structural models and kernels, which can improve on feature vectors , e.g., up to 30% of relative improvement in type classification, when little data is available, and demonstrates that the structural model scales well to other domains.
Related work	Most of the previous work on supervised sentiment analysis use feature vectors to encode documents.
Representations and models	In the next sections, we define a baseline feature vector model and a novel structural model based on kernel methods.
Representations and models	We go beyond traditional feature vectors by employing structural models (S TRUCT), which encode each comment into a shallow syntactic tree.
Representations and models	A polynomial kernel of degree 3 is applied to feature vectors (FVE C).

feature vector is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

11. Learning to Predict Distributions of Words Across Domains

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distribution Prediction	3.1 In-domain Feature Vector Construction
Distribution Prediction	3.2 Cross-Domain Feature Vector Prediction
Distribution Prediction	We model distribution prediction as a multivariate regression problem where, given a set {(109,109) £121 consisting of pairs of feature vectors selected from each domain for the pivots in W, we learn a mapping
Domain Adaptation	First, we lemmatise each word in a source domain labeled review 335;), and extract both unigrams and bigrams as features to represent mg) by a binary-valued feature vector .
Domain Adaptation	Next, we train a binary classification model, 6, using those feature vectors .
Domain Adaptation	At test time, we represent a test target review H using a binary-valued feature vector h of unigrams and bigrams of lemmas of the words in H, as we did for source domain labeled train reviews.
Related Work	The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier.

feature vector is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

12. Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

Mohler, Michael and Bunescu, Razvan and Mihalcea, Rada

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Answer Grading System	We use Mani, $8) to denote the feature vector associated with a pair of nodes (55,-, :58), where :10,- is a node from the instructor answer A, and x8 is a node from the student answer A8.
Answer Grading System	For a given answer pair (A1, As), we assemble the eight graph alignment scores into a feature vector
Answer Grading System	We combine the alignment scores $001,, A8) with the scores ¢B(Ai, As) from the lexical semantic similarity measures into a single feature vector ¢(A,-,AS) = [¢G(A,-,AS)\|¢B(A,-,AS)].
Results	We report the results of running the systems on three subsets of features ¢(Ai, A8): BOW features ¢B(Ai, As) only, alignment features $901,, As) only, or the full feature vector (labeled “Hybrid”).

feature vector is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. Distant supervision for relation extraction without labeled data

Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Architecture	If a sentence contains two entities and those entities are an instance of one of our Freebase relations, features are extracted from that sentence and are added to the feature vector for the relation.
Architecture	In training, the features for identical tuples (relation, entityl, entity2) from different sentences are combined, creating a richer feature vector .
Architecture	This time, every pair of entities appearing together in a sentence is considered a potential relation instance, and whenever those entities appear together, features are extracted on the sentence and added to a feature vector for that entity pair.
Implementation	Towards this end, we build a feature vector in the training phase for an ‘unrelated’ relation by randomly selecting entity pairs that do not appear in any Freebase relation and extracting features for them.
Implementation	Our classifier takes as input an entity pair and a feature vector , and returns a relation name and a confidence score based on the probability of the entity pair belonging to that relation.
Introduction	For each pair of entities, we aggregate the features from the many different sentences in which that pair appeared into a single feature vector , allowing us to provide our classifier with more information, resulting in more accurate labels.

feature vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

14. Error Detection for Statistical Machine Translation Using Linguistic Features

Xiong, Deyi and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Detection with a Maximum Entropy Model	To formalize this task, we use a feature vector w to represent a word 21) in question, and a binary variable 0 to indicate whether this word is correct or not.
Error Detection with a Maximum Entropy Model	In the feature vector , we look at 2 words before and 2 words after the current word position (w_2, w_1, 212,201,202).
Error Detection with a Maximum Entropy Model	We collect features {wd, p03, link, dwpp} for each word among these words and combine them into the feature vector w for 212.

feature vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

Huang, Ruihong and Riloff, Ellen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	However, when we create feature vectors for the classifier, the seeds themselves are hidden and only contextual features are used to represent each training instance.
Related Work	We use an in-house sentence segmenter and NP chunker to identify the base NPs in each sentence and create feature vectors that represent each constituent in the sentence as either an NP or an individual word.
Related Work	Two training instances would be created, with feature vectors that look like this, where M represents a modifier inside the target NP:

feature vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian MT Decipherment via Hash Sampling	One possible strategy is to compute similarity scores 8(Wfi, we/) between the current source word feature vector Wfi and feature vectors we/Eve for all possible candidates in the target vocabulary.
Bayesian MT Decipherment via Hash Sampling	This makes the complexity far worse (in practice) since the dimensionality of the feature vectors d is a much higher value than Computing similarity scores alone (nai'vely) would incur O(\|Ve\| - d) time which is prohibitively huge since we have to do this for every token in the source language corpus.
Feature-based representation for Source and Target	But unlike documents, here each word w is associated with a feature vector wl...wd (where wi represents the weight for the feature indexed by i) which is constructed from monolingual corpora.
Feature-based representation for Source and Target	Unlike the target word feature vectors (which can be precomputed from the monolingual target corpus), the feature vector for every source word fj is dynamically constructed from the target translation sampled in each training iteration.
Feature-based representation for Source and Target	), it results in the feature representation becoming more sparse (especially for source feature vectors ) which can cause problems in efficiency as well as robustness when computing similarity against other vectors.
Training Algorithm	(a) Generate a proposal distribution by computing the hamming distance between the feature vectors for the source word and each target translation candidate.

feature vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Integrating Graph-Based and Transition-Based Dependency Parsers

Nivre, Joakim and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Integrated Models	by a k-dimensional feature vector f : X —> R’“.
Integrated Models	In the feature-based integration we simply extend the feature vector for one model, called the base model, with a certain number of features generated by the other model, which we call the guide model in this context.
Integrated Models	The additional features will be referred to as guide features, and the version of the base model trained with the extended feature vector will be called the guided model.

feature vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. Metaphor Detection with Cross-Lingual Model Transfer

Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methodology	Each SVO (or AN) instance will be represented by a triple (duple) from which a feature vector will be extracted.5 The vector will consist of the concatenation of the conceptual features (which we discuss below) for all participating words, and conjunction features for word pairs.6 For example, to generate the feature vector for the SVO triple (car, drink, gasoline), we compute all the features for the individual words car, drink, gasoline and combine them with the conjunction features for the pairs car drink and drink gasoline.
Methodology	6If word one is represented by features u E R” and word two by features v E Rm then the conjunction feature vector is the vectorization of the outer product uvT.
Model and Feature Extraction	Degrees of membership in different supersenses are represented by feature vectors , where each element corresponds to one supersense.
Model and Feature Extraction	For example, the top-level classes in GermaNet include: adj.feeling (e.g., willing, pleasant, cheerful); adj.sabstance (e.g., dry, ripe, creamy); adj.spatial (e.g., adjacent, gigantic).12 For each adjective type in WordNet, they produce a vector with a classifier posterior probabilities corresponding to degrees of membership of this word in one of the 13 semantic classes,13 similar to the feature vectors we build for nouns and verbs.
Model and Feature Extraction	For languages other than English, feature vectors are projected to English features using translation dictionaries.

feature vector is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

19. Evaluating the Impact of Coder Errors on Active Learning

Rehbein, Ines and Ruppenhofer, Josef

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Instead, we create new feature vectors Fgen on the basis of the feature vectors Fseed in S. For each class in S, we extract all attribute-value pairs from the feature vectors for this particular class.
Related Work	For each class, we randomly select features (with replacement) from Fseed and combine them into a new feature vector Fgen, retaining the distribution of the different classes in the data.
Related Work	As a result, we obtain a more general set of feature vectors Fgen with characteristic features being distributed more evenly over the different feature vectors .

feature vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

20. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments & Results	For the classifier-based system, we tested various different feature vector configurations.
Experiments & Results	The various erY configurations use the same feature vector setup for all classifier experts.
Experiments & Results	The auto configuration does not uniformly apply the same feature vector setup to all classifier experts but instead seeks to find the optimal setup per classifier expert.
System	The feature vector for the classifiers represents a local context of neighbouring words, and optionally also global context keywords in a binary-valued bag-of-words configuration.
System	If not, we check for the presence of a classifier expert for the offered L1 fragment; only then we can proceed by extracting the desired number of L2 local context words to the immediate left and right of this fragment and adding those to the feature vector .

feature vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

21. Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

Wang, Baoxun and Wang, Xiaolong and Sun, Chengjie and Liu, Bingquan and Sun, Lin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning with Homogenous Data	High-dimensional feature vectors with only several nonzero dimensions bring large time consumption to our model.
Learning with Homogenous Data	Thus it is necessary to reduce the dimension of the feature vectors .
The Deep Belief Network for QA pairs	In the bottom layer, the binary feature vectors based on the statistics of the word occurrence in the answers are used to compute the “hidden features” in the
The Deep Belief Network for QA pairs	j where 0'(x) = 1/ (l + e"‘), 3 denotes the visible feature vector of the answer, qi is the ith element of the question vector, and h stands for the hidden feature vector for reconstructing the questions.
The Deep Belief Network for QA pairs	To detect the best answer to a given question, we just have to send the vectors of the question and its candidate answers into the input units of the network and perform a level-by-level calculation to obtain the corresponding feature vectors .

feature vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

22. Transliteration Alignment

Pervouchine, Vladimir and Li, Haizhou and Lin, Bo

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Transliteration alignment techniques	Withgott and Chen (1993) define a feature vector of phonological descriptors for English sounds.
Transliteration alignment techniques	We extend the idea by defining a 21-element binary feature vector for each English and Chinese phoneme.
Transliteration alignment techniques	Each element of the feature vector represents presence or absence of a phonological descriptor that differentiates various kinds of phonemes, e.g.

feature vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Global Learning of Focused Entailment Graphs

Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Quite a few methods have been suggested (Lin and Pantel, 2001; Bhagat et al., 2007; Yates and Etzioni, 2009), which differ in terms of the specifics of the ways in which predicates are represented, the features that are extracted, and the function used to compute feature vector similarity.
Experimental Evaluation	When computing distributional similarity scores, a template is represented as a feature vector of the CUIs that instantiate its arguments.
Learning Entailment Graph Edges	Next, we represent each pair of propositional templates with a feature vector of various distributional similarity scores.
Learning Entailment Graph Edges	A template pair is represented by a feature vector where each coordinate is a different distributional similarity score.
Learning Entailment Graph Edges	Another variant occurs when using binary templates: a template may be represented by a pair of feature vectors , one for each variable (Lin and Pantel, 2001), or by a single vector, where features represent pairs of instantiations (Szpektor et al., 2004; Yates and Etzioni, 2009).

feature vector is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

24. Classification of Semantic Relationships between Nominals Using Pattern Clusters

Davidov, Dmitry and Rappoport, Ari

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Relationship Classification	To do this, we construct feature vectors from each training pair, where each feature is the HITS measure corresponding to a single pattern cluster.
Relationship Classification	Once we have feature vectors , we can use a variety of classifiers (we used those in Weka) to construct a model and to evaluate it on the test set.
Relationship Classification	If we are not given any training set, it is still possible to separate between different relationship types by grouping the feature vectors of Section 4.3.2 into clusters.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. Zero-shot Entity Extraction from Web Pages

Pasupat, Panupong and Liang, Percy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	where 6 6 Rd is the parameter vector and gb(:c, w, z) is the feature vector , which will be defined in Section 3.3.
Approach	To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z.
Approach	The final feature vector is the concatenation of structural features gbs(w,z), which consider the selected nodes in the DOM tree, and denotation features gbd(:c, y), which look at the extracted entities.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming

Yang, Xiaofeng and Su, Jian and Lang, Jun and Tan, Chew Lim and Liu, Ting and Li, Sheng

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	It is impractical to enumerate all the mentions in an entity and record their information in a single feature vector , as it would make the feature space too large.
Introduction	Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms.
Modelling Coreference Resolution	As an entity may contain more than one candidate and the number is not fixed, it is impractical to enumerate all the mentions in an entity and put their properties into a single feature vector .
Related Work	In the system, a training or testing instance is formed for two mentions in question, with a feature vector describing their properties and relationships.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

27. Fast and Accurate Shift-Reduce Constituent Parsing

Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline parser	Here (I) (ai) represents the feature vector for the it}, action a, in state item 04.
Improved hypotheses comparison	The significant variance in the number of actions N can have an impact on the linear separability of state items, for which the feature vectors are 21-111 (I) (ai).
Improved hypotheses comparison	A feature vector is extracted for the IDLE action according to the final state context, in the same way as other actions.
Improved hypotheses comparison	corresponding feature vectors have about the same sizes, and are more linearly separable.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

28. Question Answering Using Enhanced Lexical Semantic Models

Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning QA Matching Models	Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector .
Learning QA Matching Models	We consider two aggregate functions for defining the feature vectors of the whole ques-tiorflanswer pair: average and max.
Learning QA Matching Models	(Dmaaj (Q7 3) : £11252 ij (wqa we) (2) wsEVs Together, each questiorflsentence pair is represented by a 2d-dimensional feature vector .

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

29. Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

Wang, Zhigang and Li, Zhixing and Li, Juanzi and Tang, Jie and Z. Pan, Jeff

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Approach	Then, we use a uniform automatic method, which primarily consists of word labeling and feature vector generation, to generate the training data set TD 2 {(55, from these collected articles.
Our Approach	(1) After the word labeling, each instance (word/token) is represented as a feature vector .
Our Approach	First, we turn the team: into a word sequence and compute the feature vector for each word based on the feature definition in Section 3.1.
Preliminaries	:10, can be represented as a feature vector according to its context.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

30. Generating Synthetic Comparable Questions for News Articles

Rokhlenko, Oleg and Szpektor, Idan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Online Question Generation	We next describe the various features we extract for every entity and the supervised models that given this feature vector representation assess the correctness of an instantiation.
Online Question Generation	The feature vector of each named entity was induced as described in Section 4.2.1.
Online Question Generation	To generate features for a candidate pair, we take the two feature vectors of the two entities and induce families of pair features by comparing between the two vectors.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

31. Utterance-Level Multimodal Sentiment Analysis

Perez-Rosas, Veronica and Mihalcea, Rada and Morency, Louis-Philippe

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	This may be due to the smaller number of feature vectors used in the experiments (only 80, as compared to the 412 used in the previous setup).
Discussion	Another possible reason is the fact that the acoustic and visual modalities are significantly weaker than the linguistic modality, most likely due to the fact that the feature vectors are now speaker-independent, which makes it harder to improve over the linguistic modality alone.
Experiments and Results	In this approach, the features collected from all the multimodal streams are combined into a single feature vector , thus resulting in one vector for each utterance in the dataset which is used to make a decision about the sentiment orientation of the utterance.
Multimodal Sentiment Analysis	The features are averaged over all the frames in an utterance, to obtain one feature vector for each utterance.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

32. Post-Retrieval Clustering Using Third-Order Similarity Measures

Moreno, Jose G. and Dias, Gaël and Cleuziou, Guillaume

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Before the clustering process takes place, Web snippets are represented as word feature vectors .
Evaluation	In particular, p is the size of the word feature vectors representing both Web snippets and centroids (p = 2.5), K is the number of clusters to be found (K = 2..10) and 8(Wik, 14/31) is the collocation measure integrated in the InfoSimba similarity measure.
Introduction	On the other hand, the polythetic approach which main idea is to represent Web snippets as word feature vectors has received less attention, the only relevant work being (Osinski and Weiss, 2005).
Introduction	feature vectors are hard to define in small collections of short text fragments (Timonen, 2013), (2) existing second-order similarity measures such as the cosine are unadapted to capture the semantic similarity between small texts, (3) Latent Semantic Analysis has evidenced inconclusive results (Osinski and Weiss, 2005) and (4) the labeling process is a surprisingly hard extra task (Carpineto et al., 2009).

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

33. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Algorithms 2 and 3 were infeasible to run on Europarl data beyond one epoch because features vectors grew too large to be kept in memory.
Introduction	The simple but effective idea is to randomly divide training data into evenly sized shards, use stochastic learning on each shard in parallel, while performing 61/62 regularization for joint feature selection on the shards after each epoch, before starting a new epoch with a reduced feature vector averaged across shards.
Joint Feature Selection in Distributed Stochastic Learning	Let each translation candidate be represented by a feature vector x 6 RD where preference pairs for training are prepared by sorting translations according to smoothed sentence-wise BLEU score (Liang et al., 2006a) against the reference.
Joint Feature Selection in Distributed Stochastic Learning	Parameter mixing by averaging will help to ease the feature sparsity problem, however, keeping feature vectors on the scale of several million features in memory can be prohibitive.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

34. Prediction of Learning Curves in Machine Translation

Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inferring a learning curve from mostly monolingual data	The feature vector qb consists of the following features:
Inferring a learning curve from mostly monolingual data	We construct the design matrix (I) with one column for each feature vector qbct corresponding to each combination of training configuration 0 and test set If.
Inferring a learning curve from mostly monolingual data	For a new unseen configuration with feature vector gbu, we determine the parameters 6., of the corresponding learning curve as:

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

35. Distributional Semantics in Technicolor

Bruni, Elia and Boleda, Gemma and Baroni, Marco and Tran, Nam Khanh

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributional semantic models	From every image in a dataset, relevant areas are identified and a low-level feature vector (called a “descriptor”) is built to represent each area.
Distributional semantic models	Now, given a new image, the nearest visual word is identified for each descriptor extracted from it, such that the image can be represented as a BoVW feature vector , by counting the instances of each visual word in the image (note that an occurrence of a low-level descriptor vector in an image, after mapping to the nearest cluster, will increment the count of a single dimension of the higher-level BoVW vector).
Distributional semantic models	We extract descriptor features of two types.6 First, the standard Scale-Invariant Feature Transform (SIFT) feature vectors (Lowe, 1999; Lowe, 2004), good at characterizing parts of objects.

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

36. A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

Zollmann, Andreas and Vogel, Stephan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering phrase pairs directly using the K-means algorithm	We thus propose to represent each phrase pair instance (including its bilingual one-word contexts) as feature vectors , i.e., points of a vector space.
Clustering phrase pairs directly using the K-means algorithm	then use these data points to partition the space into clusters, and subsequently assign each phrase pair instance the cluster of its corresponding feature vector as label.
Clustering phrase pairs directly using the K-means algorithm	In the same fashion, we can incorporate multiple tagging schemes (e.g., word clusterings of different gran-ularities) into the same feature vector .

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

37. Learning to Follow Navigational Directions

Vogel, Adam and Jurafsky, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approximate Dynamic Programming	Furthermore, as the size of the feature vector K increases, the space becomes even more difficult to search.
Reinforcement Learning Formulation	Thus, we represent state/action pairs with a feature vector gb(s, a) E RK.
Reinforcement Learning Formulation	Learning exactly which words influence decision making is difficult; reinforcement learning algorithms have problems with the large, sparse feature vectors common in natural language processing.
Reinforcement Learning Formulation	For a given state 3 = (u, l, c) and action a = (l’, 0’), our feature vector gb(s, a) is composed of the following:

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

38. Bootstrapping coreference resolution using word associations

Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results and Discussion	Then (Einstein,he), (Hawking,he) and (Novoselov, he) will all be assigned the feature vector <l, No, Proper Noun, Personal Pronoun, Yes>.
Results and Discussion	Using the same representation of pairs, suppose that for the sequence of markables Biden, Obama, President the markable pairs (Biden,President) and (0bama,President) are assigned the feature vectors <8, No, Proper Noun, Proper Noun, Yes> and <l, No, Proper Noun, Proper Noun, Yes>, respectively.
Results and Discussion	with the second feature vector (distance=l) as coreferent than with the first one (distance=8) in the entire automatically labeled training set.
System Architecture	After filtering, we then calculate a feature vector for each generated pair that survived filters (i)—(iv).

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

39. Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

Sun, Jun and Zhang, Min and Tan, Chew Lim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual Tree Kernels	In order to compute the dot product of the feature vectors in the exponentially high dimensional feature space, we introduce the tree kernel functions as follows:
Bilingual Tree Kernels	It is infeasible to explicitly compute the kernel function by expressing the sub-trees as feature vectors .
Introduction	In addition, explicitly utilizing syntactic tree fragments results in exponentially high dimensional feature vectors , which is hard to compute.
Substructure Spaces for BTKs	The feature vector of the classifier is computed using a composite kernel:

feature vector is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

40. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The results of these experiments were not particularly strong, likely owing to the increased sparsity of the feature vectors .
Abstract	Binning: Next, we wished to explore longer n-grams of words or POS tags and to reduce the sparsity of the feature vectors .
Abstract	Self-Training: Besides sparse feature vectors , another factor likely to be hurting our classifier was the limited amount of training data.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

41. Kernel Based Discourse Relation Recognition with Temporal Ordering Information

Wang, WenTing and Su, Jian and Tan, Chew Lim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Incorporating Structural Syntactic Information	Thus, it is computational infeasible to directly use the feature vector (MT).
The Recognition Framework	Suppose the training set S consists of labeled vectors {(xi, 311)}, where xi is the feature vector
The Recognition Framework	where a,- is the learned parameter for a feature vector xi, and b is another parameter which can be derived from a,- .

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

42. Automatic Image Annotation Using Auxiliary Text Information

Feng, Yansong and Lapata, Mirella

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

BBC News Database	Secondly, the generation of feature vectors is modeled directly, so there is no need for quantization.
BBC News Database	where NV] is the number of regions in image I , vr the feature vector for region r in image I , nsv the number of regions in the image of latent variable 5, v,- the feature vector for region i in 5’s image, k the dimension of the image feature vectors and Z the feature covariance matrix.
BBC News Database	According to equation (3), a Gaussian kernel is fit to every feature vector v,- corresponding to region i in the image of the latent variable 5.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

43. A Feature-Enriched Tree Kernel for Relation Extraction

Sun, Le and Han, Xianpei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Feature Vector
Introduction	where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as:
Introduction	where 6 (t1, t2) is the same indicator function as in CTK; (m, nj)is a pair of aligned nodes between 151 and t2, where m and nj are correspondingly in the same position of tree t1 and t2; E (t1, 752) is the set of all aligned node pairs; sim(n,-, nj) is the feature vector similarity between nodeni and nj, computed as the dot product between their feature vectors Fm and Fnj.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

44. Solving Relational Similarity Problems Using the Web as a Corpus

Nakov, Preslav and Hearst, Marti A.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Relational Similarity Experiments	Given a verbal analogy example, we build six feature vectors — one for each of the six word pairs.
Relational Similarity Experiments	For the evaluation, we created a feature vector for each head-modifier pair, and we performed a leave-one-out cross-validation: we left one example for testing and we trained on the remaining 599 ones, repeating this procedure 600 times so that each example be used for testing.
Relational Similarity Experiments	We calculated the similarity between the feature vector of the testing example and each of the training examples’ vectors.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

45. Robust Domain Adaptation for Relation Extraction via Clustering Consistency

Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data.
Problem Statement	We introduce a feature extraction x that maps the triple (A,B,S) to its feature vector x.
Problem Statement	:1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector , yi is the corresponding label (if available).

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

46. Weakly Supervised User Profile Extraction from Twitter

Li, Jiwei and Ritter, Alan and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	encode a tweet-level feature vector rather than an aggregate one.
Model	(3) The feature vector wtwfizfje, Xi) encodes the following standard general features:
Related Work	In the user attribute extraction literature, researchers have considered neighborhood context to boost inference accuracy (Pennacchiotti and Popescu, 2011; Al Zamal et al., 2012), where information about the degree of their connectivity to their pre-labeled users is included in the feature vectors .

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

47. Global Learning of Typed Entailment Rules

Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Distributional similarity algorithms differ in their feature representation: Some use a binary representation: each predicate is represented by one feature vector where each feature is a pair of arguments (Szpektor et al., 2004; Yates and Etzioni, 2009).
Learning Typed Entailment Graphs	2) Feature representation Each example pair of predicates (191,192) is represented by a feature vector , where each feature is a specific distributional
Learning Typed Entailment Graphs	We want to use Pm, to derive the posterior P(G\|F), where F = Uu¢vFuv and Fm, is the feature vector for a node pair (u, 2)).

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

48. Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification

Dong, Li and Wei, Furu and Tan, Chuanqi and Tang, Duyu and Zhou, Ming and Xu, Ke

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Approach	,gc are the composition functions, P (9;, \|vl, VT, e) is the probability of employing 9;, given the child vectors vl, VT and external feature vector e, and f is the nonlinearity function.
Our Approach	where 6 is the hyper-parameter, S E ROXQDHGI) is the matrix used to determine which composition function we use, vl, VT are the left and right child vectors, and e are external feature vector .
Our Approach	In this work, e is a one-hot binary feature vector which indicates what the dependency type is.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

49. Predicting Instructor's Intervention in MOOC forums

Chaturvedi, Snigdha and Goldwasser, Dan and Daumé III, Hal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Evaluation	Each row is a category and each column represents a feature vector .
Empirical Evaluation	While the actual size of vocabulary is huge, we use only a small subset of words in our feature vector for this visualization.
Intervention Prediction Models	Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector , qb(p, 7“, h, t), is extracted for each thread and using the weight vector, w, this model defines a decision function, similar to what is shown in Equation 1.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

50. Quantitative modeling of the neural representation of adjective-noun phrases to account for fMRI activation

Chang, Kai-min K. and Cherkassky, Vladimir L. and Mitchell, Tom M. and Just, Marcel Adam

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Brain Imaging Experiments on Adj ec-tive-Noun Comprehension	The regression model examined to what extent the semantic feature vectors (explanatory variables) can account for the variation in neural activity (response variable) across the 12 stimuli.
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension	Table 5 also supports our hypothesis that the multiplicative model should outperform the additive model, based on the assumption that adjectives are used to emphasize particular semantic features that will already be represented in the semantic feature vector of the noun.
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension	We are currently exploring the infinite latent semantic feature model (ILFM; Griffiths & Ghahramani, 2005), which assumes a nonparametric Indian Buffet prior to the binary feature vector and models neural activation with a linear Gaussian model.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

51. Spectral Learning of Latent-Variable PCFGs

Cohen, Shay B. and Stratos, Karl and Collins, Michael and Foster, Dean P. and Ungar, Lyle

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Estimating the Tensor Model	We assume a function u that maps outside trees 0 to feature vectors 2M0) 6 Rd].
Estimating the Tensor Model	For example, the feature vector might track the rule directly above the node in question, the word following the node in question, and so on.
Estimating the Tensor Model	We also assume a function gb that maps inside trees 75 to feature vectors gb(t) 6 Rd.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

52. Cross-Language Text Classification Using Structural Correspondence Learning

Prettenhofer, Peter and Stein, Benno

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross-Language Text Classification	In standard text classification, a document d is represented under the bag-of-words model as \|V\|-dimensional feature vector x E X, where V, the vocabulary, denotes an ordered set of words, :ci 6 x denotes the normalized frequency of word 2' in d, and X is an inner product space.
Cross-Language Text Classification	D3 denotes the training set and comprises tuples of the form (X, 3/), which associate a feature vector x E X with a class label 3/ E Y.
Experiments	A document d is described as normalized feature vector x under a unigram bag-of-words document representation.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

53. Recognizing Named Entities in Tweets

LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Method	5: for Each word w E t d0 6: Get the feature vector ID: 13 = reprw(w, t).
Our Method	10: end if 11: end for 12: Get the feature vector t: 7?
Our Method	4: Get the feature vector ID: 13 = reprw (w, t).

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CRF (35)
NER (32)
semi-supervised (24)

54. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	A regression learning method is used to infer a function that maps a feature vector (which measures the similarity of a translation to the pseudo references) to a score that indicates the quality of the translation.
Translation Selection	The regression objective is to infer a function that maps a feature vector (which measures the similarity of a translation from one system to the pseudo references) to a score that indicates the quality of the translation.
Translation Selection	The input sentence is represented as a feature vector X, which are extracted from the input sentence and the comparisons against the pseudo references.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. Optimizing Informativeness and Readability for Sentiment Summarization

Nishikawa, Hitoshi and Hasegawa, Takaaki and Matsuo, Yoshihiro and Kikui, Genichiro

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used CRFs-based Japanese dependency parser (Imamura et al., 2007) and named entity recognizer (Suzuki et al., 2006) for sentiment extraction and constructing feature vectors for readability score, respectively.
Optimizing Sentence Sequence	where, given two adjacent sentences 3,- and 3,41, ngb(sz-, 3H1), which measures the connectivity of the two sentences, is the inner product of w and gb(si, 3H1), w is a parameter vector and gb(si, 3141) is a feature vector of the two sentences.
Optimizing Sentence Sequence	We also define feature vector (13(8) of the entire sequence 8 = (so, 31, .

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. Distributional Identification of Non-Referential Pronouns

Bergsma, Shane and Lin, Dekang and Goebel, Randy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We represent feature vectors exactly as described in Section 3.3.
Methodology	3.3 Feature Vector Representation
Methodology	We can achieve these aims by ordering the counts in a feature vector , and using a labelled set of training examples to learn a classifier that optimally weights the counts.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

F-Score (13)
coreference (11)
n-gram (9)

57. Graph-based Local Coherence Modeling

Guinaudeau, Camille and Strube, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Method	A fundamental assumption underlying our model is that this bipartite graph contains the entity transition information needed for local coherence computation, rendering feature vectors and learning phase unnecessary.
The Entity Grid Model	To make this representation accessible to machine learning algorithms, Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
The Entity Grid Model	(2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

Cohn, Trevor and Specia, Lucia

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Gaussian Process Regression	In our regression task3 the data consists of n pairs D = {(xi,yi)}, where x,- 6 RF is a F-dimensional feature vector and y,- E R is the response variable.
Gaussian Process Regression	Each instance is a translation and the feature vector encodes its linguistic features; the response variable is a numerical quality judgement: post editing time or likert score.
Gaussian Process Regression	GP regression assumes the presence of a latent function, f : RF —> R, which maps from the input space of feature vectors x to a scalar.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

59. Transition-based Dependency Parsing with Selectional Branching

Choi, Jinho D. and McCallum, Andrew

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Selectional branching	For each parsing state sij, a prediction is made by generating a feature vector xij E X , feeding it into a classifier C1 that uses a feature map (19(53, y) and a weight vector w to measure a score for each label y E y, and choosing a label with the highest score.
Selectional branching	During training, a training instance is generated for each parsing state sij by taking a feature vector xij and its true label yij.
Selectional branching	Then, a subgradient is measured by taking all feature vectors together weighted by Q (line 6).

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

60. Entity-Based Local Coherence Modelling Using Topological Fields

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	We tabulate the transitions of entities between different syntactic positions (or their nonoccurrence) in sentences, and convert the frequencies of transitions into a feature vector representation of transition probabilities in the document.
Introduction	We solve this problem in a supervised machine learning setting, where the input is the feature vector representations of the two versions of the document, and the output is a binary value indicating the document with the original sentence ordering.
Introduction	Transition length — the maximum length of the transitions used in the feature vector representation of a document.

feature vector is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: