Index of papers in Proc. ACL that mention
  • feature vector
Bollegala, Danushka and Weir, David and Carroll, John
Abstract
The created thesaurus is then used to expand feature vectors to train a binary classifier.
Introduction
a unigram or a bigram of word lemma) in a review using a feature vector .
Introduction
We model the cross-domain sentiment classification problem as one of feature expansion, where we append additional related features to feature vectors that represent source and target domain reviews in order to reduce the mismatch of features between the two domains.
Introduction
thesaurus to expand feature vectors in a binary classifier at train and test times by introducing related lexical elements from the thesaurus.
Sentiment Sensitive Thesaurus
For example, if we know that both excellent and delicious are positive sentiment words, then we can use this knowledge to expand a feature vector that contains the word delicious using the word excellent, thereby reducing the mismatch between features in a test instance and a trained model.
Sentiment Sensitive Thesaurus
Let us denote the value of a feature 21) in the feature vector u representing a lexical element u by f (u, The vector u can be seen as a compact representation of the distribution of a lexical element u over the set of features that co-occur with u in the reviews.
Sentiment Sensitive Thesaurus
From the construction of the feature vector u described in the previous paragraph, it follows that 21) can be either a sentiment feature or another lexical element that co-occurs with u in some review sentence.
feature vector is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Cai, Peng and Gao, Wei and Zhou, Aoying and Wong, Kam-Fai
Abstract
The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector .
Evaluation
Specifically, after document feature aggregation, the number of query feature vectors in all adaptation tasks is no more than 150 in source and target domains.
Introduction
Take Figure 2 as a toy example, where the document instance is represented as a feature vector with four features.
Introduction
In this work, we present two simple but very effective approaches attempting to resolve the problem from distinct perspectives: (1) we compress each query into a query feature vector by aggregating all of its document instances, and then conduct query weighting on these query feature vectors ; (2) we measure the similarity between the source query and each target query one by one, and then combine these fine- grained similarity values to calculate its importance to the target domain.
Query Weighting
The query can be compressed into a query feature vector , where each feature value is obtained by the aggregate of its corresponding features of all documents in the query.
Query Weighting
We concatenate two types of aggregates to construct the query feature vector : the mean [i = fi Zlqzll
Query Weighting
is the feature vector of document 2' and |q| denotes the number of documents in q .
feature vector is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi
Abstract
In this paper, we use tensors to map high-dimensional feature vectors into low dimensional representations.
Introduction
The exploding dimensionality of rich feature vectors must then be balanced with the difficulty of effectively learning the associated parameters from limited training data.
Introduction
We depart from this view and leverage high-dimensional feature vectors by mapping them into low dimensional representations.
Introduction
We begin by representing high-dimensional feature vectors as multi-way cross-products of smaller feature vectors that represent words and their syntactic relations (arcs).
Problem Formulation
We can alternatively specify arc features in terms of rank-l tensors by taking the Kronecker product of simpler feature vectors associated with the head (vector gbh E R”), and modifier (vector gbm E R”), as well as the arc itself (vector ohm E Rd).
Problem Formulation
Here ohm is much lower dimensional than the MST arc feature vector gbhmm discussed earlier.
Problem Formulation
By taking the crossproduct of all these component feature vectors , we obtain the full feature representation for arc h —> m as a rank-l tensor
feature vector is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Distributed K-Means clustering
Given a set of elements represented as feature vectors and a number, k, of desired clusters, the K-Means algorithm consists of the following steps:
Distributed K-Means clustering
Before describing our parallel implementation of the K-Means algorithm, we first describe the phrases to be clusters and how their feature vectors are constructed.
Distributed K-Means clustering
Following previous approaches to distributional clustering of words, we represent the contexts of a phrase as a feature vector .
feature vector is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Yu, Haonan and Siskind, Jeffrey Mark
Detailed Problem Formulation
We use discrete features, namely natural numbers, in our feature vectors , quantized by a binning process.
Detailed Problem Formulation
The length of the feature vector may vary across parts of speech.
Detailed Problem Formulation
Let NC denote the length of the feature vector for part of speech c, 3073; denote the time-series (mil, .
Introduction
We associate a feature vector with each frame (detection) of each such track.
Introduction
This feature vector can encode image features (including the identity of the particular detector that produced that detection) that correlate with object class; region color, shape, and size features that correlate with object properties; and motion features, such as linear and angular object position, velocity, and acceleration, that correlate with event properties.
Introduction
involves computing the associated feature vector for that HMM over the detections in the tracks chosen to fill its arguments.
The Sentence Tracker
,qT) denote the sequence of states qt that leads to an observed track, B (D75, jt,qt,)\) denote the conditional log probability of observing the feature vector associated with the detection selected by jt among the detections D75 in frame t, given that the HMM is in state qt, and A(qt_1, qt, A) denote the log transition probability of the HMM.
The Sentence Tracker
We further need to generalize F so that it computes the joint score of a sequence of detections, one for each track, G so that it computes the joint measure of coherence between a sequence of pairs of detections in two adjacent frames, and B so that it computes the joint conditional log probability of observing the feature vectors associated with the sequence of detections selected by jt.
The Sentence Tracker
We further need to generalize B so that it computes the joint conditional log probability of observing the feature vectors for the detections in the tracks that are assigned to the arguments of the HMM for each word in the sentence and A so that it computes the joint log transition probability for the HMMs for all words in the sentence.
feature vector is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Lee, Chia-ying and Glass, James
Inference
For each of the remaining feature vectors in
Inference
Mixture ID (mt) For each feature vector in a segment, given the cluster label CM and the hidden state index st, the derivation of the conditional posterior probability of its mixture ID is straightforward:
Inference
where mm is the set of mixture IDs of feature vectors that belong to state 8 of HMM c. The mth entry of fl’ is fl + themcqs 6(mt, m), where we use
Model
Given the cluster label, choose a hidden state for each feature vector :13; in the segment.
Model
Use the chosen Gaussian mixture to generate the observed feature vector
Model
2, where the shaded circle denotes the observed feature vectors , and the squares denote the hyperparameters of the priors used in our model.
Problem Formulation
1 illustrates how the speech signal of a single word utterance banana is converted to a sequence of feature vectors to £13711.
Problem Formulation
Segment (p;- k) We define a segment to be composed of feature vectors between two boundary frames.
Problem Formulation
Hidden State (8%) Since we assume the observed data are generated by HMMs, each feature vector , 30%, has an associated hidden state index.
feature vector is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Kotlerman, Lili and Dagan, Ido and Szpektor, Idan and Zhitomirsky-Geffet, Maayan
A Statistical Inclusion Measure
Amongst these features, those found in 21’s feature vector are termed included features.
A Statistical Inclusion Measure
In preliminary data analysis of pairs of feature vectors , which correspond to a known set of valid and invalid expansions, we identified the following desired properties for a distributional inclusion measure.
A Statistical Inclusion Measure
In our case the feature vector of the expanded word is analogous to the set of all relevant documents while tested features correspond to retrieved documents.
Background
First, a feature vector is constructed for each word by collecting context words as features.
Background
where FVgc is the feature vector of a word cc and way (f) is the weight of the feature f in that word’s vector, set to their pointwise mutual information.
Background
Extending this rationale to the textual entailment setting, Geffet and Dagan (2005) expected that if the meaning of a word it entails that of 2) then all its prominent context features (under a certain notion of “prominence”) would be included in the feature vector of v as well.
Conclusions and Future work
This paper advocates the use of directional similarity measures for lexical expansion, and potentially for other tasks, based on distributional inclusion of feature vectors .
Evaluation and Results
Feature vectors were created by parsing the Reuters RCVl corpus and taking the words related to each term through a dependency relation as its features (coupled with the relation name and direction, as in (Lin, 1998)).
feature vector is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Abstract
In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector .
Introduction
We also integrate word embedding into the model by representing each word as a feature vector (Collobert and Weston, 2008).
Introduction
.. ,hK(f,e,d))T is a K -dimensional feature vector defined on the tuple (f,e,d); W = (w1,w2,--- ,wK)T is a K-dimensional weight vector of h, i.e., the parameters of the model, and it can be tuned by the toolkit MERT (Och, 2003).
Introduction
(3) as a function of a feature vector h, i.e.
feature vector is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan
Bilingual Lexicon Induction
Then, for each matched pair of word types (2', j ) E m, we need to generate the observed feature vectors of the source and target word types, fs(si) 6 Rd5 and fT(tj) E RdT.
Bilingual Lexicon Induction
The feature vector of each word type is computed from the appropriate monolingual corpus and summarizes the word’s monolingual characteristics; see section 5 for details and figure 2 for an illustration.
Bilingual Lexicon Induction
Specifically, to generate the feature vectors , we first generate a random concept 2M N N(0, Id), where Id is the d x d identity matrix.
Features
For a concrete example of a word type to feature vector mapping, see figure 2.
Inference
4Since ds and dT can be quite large in practice and often greater than lml, we use Cholesky decomposition to re-represent the feature vectors as lml-dimensional vectors with the same dot products, which is all that CCA depends on.
Introduction
In our method, we represent each language as a monolingual lexicon (see figure 2): a list of word types characterized by monolingual feature vectors , such as context counts, orthographic substrings, and so on (section 5).
feature vector is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Severyn, Aliaksei and Moschitti, Alessandro and Uryupina, Olga and Plank, Barbara and Filippova, Katja
Conclusions and Future Work
We use standard feature vectors augmented by shallow syntactic trees enriched with additional conceptual information.
Conclusions and Future Work
This paper makes several contributions: (i) it shows that effective OM can be carried out with supervised models trained on high quality annotations; (ii) it introduces a novel annotated corpus of YouTube comments, which we make available for the research community; (iii) it defines novel structural models and kernels, which can improve on feature vectors , e.g., up to 30% of relative improvement in type classification, when little data is available, and demonstrates that the structural model scales well to other domains.
Related work
Most of the previous work on supervised sentiment analysis use feature vectors to encode documents.
Representations and models
In the next sections, we define a baseline feature vector model and a novel structural model based on kernel methods.
Representations and models
We go beyond traditional feature vectors by employing structural models (S TRUCT), which encode each comment into a shallow syntactic tree.
Representations and models
A polynomial kernel of degree 3 is applied to feature vectors (FVE C).
feature vector is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Distribution Prediction
3.1 In-domain Feature Vector Construction
Distribution Prediction
3.2 Cross-Domain Feature Vector Prediction
Distribution Prediction
We model distribution prediction as a multivariate regression problem where, given a set {(109,109) £121 consisting of pairs of feature vectors selected from each domain for the pivots in W, we learn a mapping
Domain Adaptation
First, we lemmatise each word in a source domain labeled review 335;), and extract both unigrams and bigrams as features to represent mg) by a binary-valued feature vector .
Domain Adaptation
Next, we train a binary classification model, 6, using those feature vectors .
Domain Adaptation
At test time, we represent a test target review H using a binary-valued feature vector h of unigrams and bigrams of lemmas of the words in H, as we did for source domain labeled train reviews.
Related Work
The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier.
feature vector is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Mohler, Michael and Bunescu, Razvan and Mihalcea, Rada
Answer Grading System
We use Mani, $8) to denote the feature vector associated with a pair of nodes (55,-, :58), where :10,- is a node from the instructor answer A, and x8 is a node from the student answer A8.
Answer Grading System
For a given answer pair (A1, As), we assemble the eight graph alignment scores into a feature vector
Answer Grading System
We combine the alignment scores $001,, A8) with the scores ¢B(Ai, As) from the lexical semantic similarity measures into a single feature vector ¢(A,-,AS) = [¢G(A,-,AS)|¢B(A,-,AS)].
Results
We report the results of running the systems on three subsets of features ¢(Ai, A8): BOW features ¢B(Ai, As) only, alignment features $901,, As) only, or the full feature vector (labeled “Hybrid”).
feature vector is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel
Architecture
If a sentence contains two entities and those entities are an instance of one of our Freebase relations, features are extracted from that sentence and are added to the feature vector for the relation.
Architecture
In training, the features for identical tuples (relation, entityl, entity2) from different sentences are combined, creating a richer feature vector .
Architecture
This time, every pair of entities appearing together in a sentence is considered a potential relation instance, and whenever those entities appear together, features are extracted on the sentence and added to a feature vector for that entity pair.
Implementation
Towards this end, we build a feature vector in the training phase for an ‘unrelated’ relation by randomly selecting entity pairs that do not appear in any Freebase relation and extracting features for them.
Implementation
Our classifier takes as input an entity pair and a feature vector , and returns a relation name and a confidence score based on the probability of the entity pair belonging to that relation.
Introduction
For each pair of entities, we aggregate the features from the many different sentences in which that pair appeared into a single feature vector , allowing us to provide our classifier with more information, resulting in more accurate labels.
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Error Detection with a Maximum Entropy Model
To formalize this task, we use a feature vector w to represent a word 21) in question, and a binary variable 0 to indicate whether this word is correct or not.
Error Detection with a Maximum Entropy Model
In the feature vector , we look at 2 words before and 2 words after the current word position (w_2, w_1, 212,201,202).
Error Detection with a Maximum Entropy Model
We collect features {wd, p03, link, dwpp} for each word among these words and combine them into the feature vector w for 212.
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Huang, Ruihong and Riloff, Ellen
Related Work
However, when we create feature vectors for the classifier, the seeds themselves are hidden and only contextual features are used to represent each training instance.
Related Work
We use an in-house sentence segmenter and NP chunker to identify the base NPs in each sentence and create feature vectors that represent each constituent in the sentence as either an NP or an individual word.
Related Work
Two training instances would be created, with feature vectors that look like this, where M represents a modifier inside the target NP:
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Bayesian MT Decipherment via Hash Sampling
One possible strategy is to compute similarity scores 8(Wfi, we/) between the current source word feature vector Wfi and feature vectors we/Eve for all possible candidates in the target vocabulary.
Bayesian MT Decipherment via Hash Sampling
This makes the complexity far worse (in practice) since the dimensionality of the feature vectors d is a much higher value than Computing similarity scores alone (nai'vely) would incur O(|Ve| - d) time which is prohibitively huge since we have to do this for every token in the source language corpus.
Feature-based representation for Source and Target
But unlike documents, here each word w is associated with a feature vector wl...wd (where wi represents the weight for the feature indexed by i) which is constructed from monolingual corpora.
Feature-based representation for Source and Target
Unlike the target word feature vectors (which can be precomputed from the monolingual target corpus), the feature vector for every source word fj is dynamically constructed from the target translation sampled in each training iteration.
Feature-based representation for Source and Target
), it results in the feature representation becoming more sparse (especially for source feature vectors ) which can cause problems in efficiency as well as robustness when computing similarity against other vectors.
Training Algorithm
(a) Generate a proposal distribution by computing the hamming distance between the feature vectors for the source word and each target translation candidate.
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Nivre, Joakim and McDonald, Ryan
Integrated Models
by a k-dimensional feature vector f : X —> R’“.
Integrated Models
In the feature-based integration we simply extend the feature vector for one model, called the base model, with a certain number of features generated by the other model, which we call the guide model in this context.
Integrated Models
The additional features will be referred to as guide features, and the version of the base model trained with the extended feature vector will be called the guided model.
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris
Methodology
Each SVO (or AN) instance will be represented by a triple (duple) from which a feature vector will be extracted.5 The vector will consist of the concatenation of the conceptual features (which we discuss below) for all participating words, and conjunction features for word pairs.6 For example, to generate the feature vector for the SVO triple (car, drink, gasoline), we compute all the features for the individual words car, drink, gasoline and combine them with the conjunction features for the pairs car drink and drink gasoline.
Methodology
6If word one is represented by features u E R” and word two by features v E Rm then the conjunction feature vector is the vectorization of the outer product uvT.
Model and Feature Extraction
Degrees of membership in different supersenses are represented by feature vectors , where each element corresponds to one supersense.
Model and Feature Extraction
For example, the top-level classes in GermaNet include: adj.feeling (e.g., willing, pleasant, cheerful); adj.sabstance (e.g., dry, ripe, creamy); adj.spatial (e.g., adjacent, gigantic).12 For each adjective type in WordNet, they produce a vector with a classifier posterior probabilities corresponding to degrees of membership of this word in one of the 13 semantic classes,13 similar to the feature vectors we build for nouns and verbs.
Model and Feature Extraction
For languages other than English, feature vectors are projected to English features using translation dictionaries.
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Rehbein, Ines and Ruppenhofer, Josef
Related Work
Instead, we create new feature vectors Fgen on the basis of the feature vectors Fseed in S. For each class in S, we extract all attribute-value pairs from the feature vectors for this particular class.
Related Work
For each class, we randomly select features (with replacement) from Fseed and combine them into a new feature vector Fgen, retaining the distribution of the different classes in the data.
Related Work
As a result, we obtain a more general set of feature vectors Fgen with characteristic features being distributed more evenly over the different feature vectors .
feature vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Experiments & Results
For the classifier-based system, we tested various different feature vector configurations.
Experiments & Results
The various erY configurations use the same feature vector setup for all classifier experts.
Experiments & Results
The auto configuration does not uniformly apply the same feature vector setup to all classifier experts but instead seeks to find the optimal setup per classifier expert.
System
The feature vector for the classifiers represents a local context of neighbouring words, and optionally also global context keywords in a binary-valued bag-of-words configuration.
System
If not, we check for the presence of a classifier expert for the offered L1 fragment; only then we can proceed by extracting the desired number of L2 local context words to the immediate left and right of this fragment and adding those to the feature vector .
feature vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wang, Baoxun and Wang, Xiaolong and Sun, Chengjie and Liu, Bingquan and Sun, Lin
Learning with Homogenous Data
High-dimensional feature vectors with only several nonzero dimensions bring large time consumption to our model.
Learning with Homogenous Data
Thus it is necessary to reduce the dimension of the feature vectors .
The Deep Belief Network for QA pairs
In the bottom layer, the binary feature vectors based on the statistics of the word occurrence in the answers are used to compute the “hidden features” in the
The Deep Belief Network for QA pairs
j where 0'(x) = 1/ (l + e"‘), 3 denotes the visible feature vector of the answer, qi is the ith element of the question vector, and h stands for the hidden feature vector for reconstructing the questions.
The Deep Belief Network for QA pairs
To detect the best answer to a given question, we just have to send the vectors of the question and its candidate answers into the input units of the network and perform a level-by-level calculation to obtain the corresponding feature vectors .
feature vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Pervouchine, Vladimir and Li, Haizhou and Lin, Bo
Transliteration alignment techniques
Withgott and Chen (1993) define a feature vector of phonological descriptors for English sounds.
Transliteration alignment techniques
We extend the idea by defining a 21-element binary feature vector for each English and Chinese phoneme.
Transliteration alignment techniques
Each element of the feature vector represents presence or absence of a phonological descriptor that differentiates various kinds of phonemes, e.g.
feature vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob
Background
Quite a few methods have been suggested (Lin and Pantel, 2001; Bhagat et al., 2007; Yates and Etzioni, 2009), which differ in terms of the specifics of the ways in which predicates are represented, the features that are extracted, and the function used to compute feature vector similarity.
Experimental Evaluation
When computing distributional similarity scores, a template is represented as a feature vector of the CUIs that instantiate its arguments.
Learning Entailment Graph Edges
Next, we represent each pair of propositional templates with a feature vector of various distributional similarity scores.
Learning Entailment Graph Edges
A template pair is represented by a feature vector where each coordinate is a different distributional similarity score.
Learning Entailment Graph Edges
Another variant occurs when using binary templates: a template may be represented by a pair of feature vectors , one for each variable (Lin and Pantel, 2001), or by a single vector, where features represent pairs of instantiations (Szpektor et al., 2004; Yates and Etzioni, 2009).
feature vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Davidov, Dmitry and Rappoport, Ari
Relationship Classification
To do this, we construct feature vectors from each training pair, where each feature is the HITS measure corresponding to a single pattern cluster.
Relationship Classification
Once we have feature vectors , we can use a variety of classifiers (we used those in Weka) to construct a model and to evaluate it on the test set.
Relationship Classification
If we are not given any training set, it is still possible to separate between different relationship types by grouping the feature vectors of Section 4.3.2 into clusters.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pasupat, Panupong and Liang, Percy
Approach
where 6 6 Rd is the parameter vector and gb(:c, w, z) is the feature vector , which will be defined in Section 3.3.
Approach
To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z.
Approach
The final feature vector is the concatenation of structural features gbs(w,z), which consider the selected nodes in the DOM tree, and denotation features gbd(:c, y), which look at the extracted entities.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Xiaofeng and Su, Jian and Lang, Jun and Tan, Chew Lim and Liu, Ting and Li, Sheng
Introduction
It is impractical to enumerate all the mentions in an entity and record their information in a single feature vector , as it would make the feature space too large.
Introduction
Even worse, the number of mentions in an entity is not fixed, which would result in variant-length feature vectors and make trouble for normal machine learning algorithms.
Modelling Coreference Resolution
As an entity may contain more than one candidate and the number is not fixed, it is impractical to enumerate all the mentions in an entity and put their properties into a single feature vector .
Related Work
In the system, a training or testing instance is formed for two mentions in question, with a feature vector describing their properties and relationships.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Baseline parser
Here (I) (ai) represents the feature vector for the it}, action a, in state item 04.
Improved hypotheses comparison
The significant variance in the number of actions N can have an impact on the linear separability of state items, for which the feature vectors are 21-111 (I) (ai).
Improved hypotheses comparison
A feature vector is extracted for the IDLE action according to the final state context, in the same way as other actions.
Improved hypotheses comparison
corresponding feature vectors have about the same sizes, and are more linearly separable.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej
Learning QA Matching Models
Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector .
Learning QA Matching Models
We consider two aggregate functions for defining the feature vectors of the whole ques-tiorflanswer pair: average and max.
Learning QA Matching Models
(Dmaaj (Q7 3) : £11252 ij (wqa we) (2) wsEVs Together, each questiorflsentence pair is represented by a 2d-dimensional feature vector .
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhigang and Li, Zhixing and Li, Juanzi and Tang, Jie and Z. Pan, Jeff
Our Approach
Then, we use a uniform automatic method, which primarily consists of word labeling and feature vector generation, to generate the training data set TD 2 {(55, from these collected articles.
Our Approach
(1) After the word labeling, each instance (word/token) is represented as a feature vector .
Our Approach
First, we turn the team: into a word sequence and compute the feature vector for each word based on the feature definition in Section 3.1.
Preliminaries
:10, can be represented as a feature vector according to its context.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Rokhlenko, Oleg and Szpektor, Idan
Online Question Generation
We next describe the various features we extract for every entity and the supervised models that given this feature vector representation assess the correctness of an instantiation.
Online Question Generation
The feature vector of each named entity was induced as described in Section 4.2.1.
Online Question Generation
To generate features for a candidate pair, we take the two feature vectors of the two entities and induce families of pair features by comparing between the two vectors.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Perez-Rosas, Veronica and Mihalcea, Rada and Morency, Louis-Philippe
Discussion
This may be due to the smaller number of feature vectors used in the experiments (only 80, as compared to the 412 used in the previous setup).
Discussion
Another possible reason is the fact that the acoustic and visual modalities are significantly weaker than the linguistic modality, most likely due to the fact that the feature vectors are now speaker-independent, which makes it harder to improve over the linguistic modality alone.
Experiments and Results
In this approach, the features collected from all the multimodal streams are combined into a single feature vector , thus resulting in one vector for each utterance in the dataset which is used to make a decision about the sentiment orientation of the utterance.
Multimodal Sentiment Analysis
The features are averaged over all the frames in an utterance, to obtain one feature vector for each utterance.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Moreno, Jose G. and Dias, Gaël and Cleuziou, Guillaume
Evaluation
Before the clustering process takes place, Web snippets are represented as word feature vectors .
Evaluation
In particular, p is the size of the word feature vectors representing both Web snippets and centroids (p = 2.5), K is the number of clusters to be found (K = 2..10) and 8(Wik, 14/31) is the collocation measure integrated in the InfoSimba similarity measure.
Introduction
On the other hand, the polythetic approach which main idea is to represent Web snippets as word feature vectors has received less attention, the only relevant work being (Osinski and Weiss, 2005).
Introduction
feature vectors are hard to define in small collections of short text fragments (Timonen, 2013), (2) existing second-order similarity measures such as the cosine are unadapted to capture the semantic similarity between small texts, (3) Latent Semantic Analysis has evidenced inconclusive results (Osinski and Weiss, 2005) and (4) the labeling process is a surprisingly hard extra task (Carpineto et al., 2009).
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Experiments
Algorithms 2 and 3 were infeasible to run on Europarl data beyond one epoch because features vectors grew too large to be kept in memory.
Introduction
The simple but effective idea is to randomly divide training data into evenly sized shards, use stochastic learning on each shard in parallel, while performing 61/62 regularization for joint feature selection on the shards after each epoch, before starting a new epoch with a reduced feature vector averaged across shards.
Joint Feature Selection in Distributed Stochastic Learning
Let each translation candidate be represented by a feature vector x 6 RD where preference pairs for training are prepared by sorting translations according to smoothed sentence-wise BLEU score (Liang et al., 2006a) against the reference.
Joint Feature Selection in Distributed Stochastic Learning
Parameter mixing by averaging will help to ease the feature sparsity problem, however, keeping feature vectors on the scale of several million features in memory can be prohibitive.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Inferring a learning curve from mostly monolingual data
The feature vector qb consists of the following features:
Inferring a learning curve from mostly monolingual data
We construct the design matrix (I) with one column for each feature vector qbct corresponding to each combination of training configuration 0 and test set If.
Inferring a learning curve from mostly monolingual data
For a new unseen configuration with feature vector gbu, we determine the parameters 6., of the corresponding learning curve as:
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bruni, Elia and Boleda, Gemma and Baroni, Marco and Tran, Nam Khanh
Distributional semantic models
From every image in a dataset, relevant areas are identified and a low-level feature vector (called a “descriptor”) is built to represent each area.
Distributional semantic models
Now, given a new image, the nearest visual word is identified for each descriptor extracted from it, such that the image can be represented as a BoVW feature vector , by counting the instances of each visual word in the image (note that an occurrence of a low-level descriptor vector in an image, after mapping to the nearest cluster, will increment the count of a single dimension of the higher-level BoVW vector).
Distributional semantic models
We extract descriptor features of two types.6 First, the standard Scale-Invariant Feature Transform (SIFT) feature vectors (Lowe, 1999; Lowe, 2004), good at characterizing parts of objects.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Clustering phrase pairs directly using the K-means algorithm
We thus propose to represent each phrase pair instance (including its bilingual one-word contexts) as feature vectors , i.e., points of a vector space.
Clustering phrase pairs directly using the K-means algorithm
then use these data points to partition the space into clusters, and subsequently assign each phrase pair instance the cluster of its corresponding feature vector as label.
Clustering phrase pairs directly using the K-means algorithm
In the same fashion, we can incorporate multiple tagging schemes (e.g., word clusterings of different gran-ularities) into the same feature vector .
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Vogel, Adam and Jurafsky, Daniel
Approximate Dynamic Programming
Furthermore, as the size of the feature vector K increases, the space becomes even more difficult to search.
Reinforcement Learning Formulation
Thus, we represent state/action pairs with a feature vector gb(s, a) E RK.
Reinforcement Learning Formulation
Learning exactly which words influence decision making is difficult; reinforcement learning algorithms have problems with the large, sparse feature vectors common in natural language processing.
Reinforcement Learning Formulation
For a given state 3 = (u, l, c) and action a = (l’, 0’), our feature vector gb(s, a) is composed of the following:
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans
Results and Discussion
Then (Einstein,he), (Hawking,he) and (Novoselov, he) will all be assigned the feature vector <l, No, Proper Noun, Personal Pronoun, Yes>.
Results and Discussion
Using the same representation of pairs, suppose that for the sequence of markables Biden, Obama, President the markable pairs (Biden,President) and (0bama,President) are assigned the feature vectors <8, No, Proper Noun, Proper Noun, Yes> and <l, No, Proper Noun, Proper Noun, Yes>, respectively.
Results and Discussion
with the second feature vector (distance=l) as coreferent than with the first one (distance=8) in the entire automatically labeled training set.
System Architecture
After filtering, we then calculate a feature vector for each generated pair that survived filters (i)—(iv).
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Bilingual Tree Kernels
In order to compute the dot product of the feature vectors in the exponentially high dimensional feature space, we introduce the tree kernel functions as follows:
Bilingual Tree Kernels
It is infeasible to explicitly compute the kernel function by expressing the sub-trees as feature vectors .
Introduction
In addition, explicitly utilizing syntactic tree fragments results in exponentially high dimensional feature vectors , which is hard to compute.
Substructure Spaces for BTKs
The feature vector of the classifier is computed using a composite kernel:
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
The results of these experiments were not particularly strong, likely owing to the increased sparsity of the feature vectors .
Abstract
Binning: Next, we wished to explore longer n-grams of words or POS tags and to reduce the sparsity of the feature vectors .
Abstract
Self-Training: Besides sparse feature vectors , another factor likely to be hurting our classifier was the limited amount of training data.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, WenTing and Su, Jian and Tan, Chew Lim
Incorporating Structural Syntactic Information
Thus, it is computational infeasible to directly use the feature vector (MT).
The Recognition Framework
Suppose the training set S consists of labeled vectors {(xi, 311)}, where xi is the feature vector
The Recognition Framework
where a,- is the learned parameter for a feature vector xi, and b is another parameter which can be derived from a,- .
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Yansong and Lapata, Mirella
BBC News Database
Secondly, the generation of feature vectors is modeled directly, so there is no need for quantization.
BBC News Database
where NV] is the number of regions in image I , vr the feature vector for region r in image I , nsv the number of regions in the image of latent variable 5, v,- the feature vector for region i in 5’s image, k the dimension of the image feature vectors and Z the feature covariance matrix.
BBC News Database
According to equation (3), a Gaussian kernel is fit to every feature vector v,- corresponding to region i in the image of the latent variable 5.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Le and Han, Xianpei
Introduction
Feature Vector
Introduction
where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as:
Introduction
where 6 (t1, t2) is the same indicator function as in CTK; (m, nj)is a pair of aligned nodes between 151 and t2, where m and nj are correspondingly in the same position of tree t1 and t2; E (t1, 752) is the set of all aligned node pairs; sim(n,-, nj) is the feature vector similarity between nodeni and nj, computed as the dot product between their feature vectors Fm and Fnj.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nakov, Preslav and Hearst, Marti A.
Relational Similarity Experiments
Given a verbal analogy example, we build six feature vectors — one for each of the six word pairs.
Relational Similarity Experiments
For the evaluation, we created a feature vector for each head-modifier pair, and we performed a leave-one-out cross-validation: we left one example for testing and we trained on the remaining 599 ones, repeating this procedure 600 times so that each example be used for testing.
Relational Similarity Experiments
We calculated the similarity between the feature vector of the testing example and each of the training examples’ vectors.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong
Experiments
negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data.
Problem Statement
We introduce a feature extraction x that maps the triple (A,B,S) to its feature vector x.
Problem Statement
:1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector , yi is the corresponding label (if available).
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ritter, Alan and Hovy, Eduard
Model
encode a tweet-level feature vector rather than an aggregate one.
Model
(3) The feature vector wtwfizfje, Xi) encodes the following standard general features:
Related Work
In the user attribute extraction literature, researchers have considered neighborhood context to boost inference accuracy (Pennacchiotti and Popescu, 2011; Al Zamal et al., 2012), where information about the degree of their connectivity to their pre-labeled users is included in the feature vectors .
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob
Background
Distributional similarity algorithms differ in their feature representation: Some use a binary representation: each predicate is represented by one feature vector where each feature is a pair of arguments (Szpektor et al., 2004; Yates and Etzioni, 2009).
Learning Typed Entailment Graphs
2) Feature representation Each example pair of predicates (191,192) is represented by a feature vector , where each feature is a specific distributional
Learning Typed Entailment Graphs
We want to use Pm, to derive the posterior P(G|F), where F = Uu¢vFuv and Fm, is the feature vector for a node pair (u, 2)).
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Dong, Li and Wei, Furu and Tan, Chuanqi and Tang, Duyu and Zhou, Ming and Xu, Ke
Our Approach
,gc are the composition functions, P (9;, |vl, VT, e) is the probability of employing 9;, given the child vectors vl, VT and external feature vector e, and f is the nonlinearity function.
Our Approach
where 6 is the hyper-parameter, S E ROXQDHGI) is the matrix used to determine which composition function we use, vl, VT are the left and right child vectors, and e are external feature vector .
Our Approach
In this work, e is a one-hot binary feature vector which indicates what the dependency type is.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chaturvedi, Snigdha and Goldwasser, Dan and Daumé III, Hal
Empirical Evaluation
Each row is a category and each column represents a feature vector .
Empirical Evaluation
While the actual size of vocabulary is huge, we use only a small subset of words in our feature vector for this visualization.
Intervention Prediction Models
Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector , qb(p, 7“, h, t), is extracted for each thread and using the weight vector, w, this model defines a decision function, similar to what is shown in Equation 1.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chang, Kai-min K. and Cherkassky, Vladimir L. and Mitchell, Tom M. and Just, Marcel Adam
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension
The regression model examined to what extent the semantic feature vectors (explanatory variables) can account for the variation in neural activity (response variable) across the 12 stimuli.
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension
Table 5 also supports our hypothesis that the multiplicative model should outperform the additive model, based on the assumption that adjectives are used to emphasize particular semantic features that will already be represented in the semantic feature vector of the noun.
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension
We are currently exploring the infinite latent semantic feature model (ILFM; Griffiths & Ghahramani, 2005), which assumes a nonparametric Indian Buffet prior to the binary feature vector and models neural activation with a linear Gaussian model.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Stratos, Karl and Collins, Michael and Foster, Dean P. and Ungar, Lyle
Estimating the Tensor Model
We assume a function u that maps outside trees 0 to feature vectors 2M0) 6 Rd].
Estimating the Tensor Model
For example, the feature vector might track the rule directly above the node in question, the word following the node in question, and so on.
Estimating the Tensor Model
We also assume a function gb that maps inside trees 75 to feature vectors gb(t) 6 Rd.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Cross-Language Text Classification
In standard text classification, a document d is represented under the bag-of-words model as |V|-dimensional feature vector x E X, where V, the vocabulary, denotes an ordered set of words, :ci 6 x denotes the normalized frequency of word 2' in d, and X is an inner product space.
Cross-Language Text Classification
D3 denotes the training set and comprises tuples of the form (X, 3/), which associate a feature vector x E X with a class label 3/ E Y.
Experiments
A document d is described as normalized feature vector x under a unigram bag-of-words document representation.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming
Our Method
5: for Each word w E t d0 6: Get the feature vector ID: 13 = reprw(w, t).
Our Method
10: end if 11: end for 12: Get the feature vector t: 7?
Our Method
4: Get the feature vector ID: 13 = reprw (w, t).
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Introduction
A regression learning method is used to infer a function that maps a feature vector (which measures the similarity of a translation to the pseudo references) to a score that indicates the quality of the translation.
Translation Selection
The regression objective is to infer a function that maps a feature vector (which measures the similarity of a translation from one system to the pseudo references) to a score that indicates the quality of the translation.
Translation Selection
The input sentence is represented as a feature vector X, which are extracted from the input sentence and the comparisons against the pseudo references.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nishikawa, Hitoshi and Hasegawa, Takaaki and Matsuo, Yoshihiro and Kikui, Genichiro
Experiments
We used CRFs-based Japanese dependency parser (Imamura et al., 2007) and named entity recognizer (Suzuki et al., 2006) for sentiment extraction and constructing feature vectors for readability score, respectively.
Optimizing Sentence Sequence
where, given two adjacent sentences 3,- and 3,41, ngb(sz-, 3H1), which measures the connectivity of the two sentences, is the inner product of w and gb(si, 3H1), w is a parameter vector and gb(si, 3141) is a feature vector of the two sentences.
Optimizing Sentence Sequence
We also define feature vector (13(8) of the entire sequence 8 = (so, 31, .
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Evaluation
We represent feature vectors exactly as described in Section 3.3.
Methodology
3.3 Feature Vector Representation
Methodology
We can achieve these aims by ordering the counts in a feature vector , and using a labelled set of training examples to learn a classifier that optimally weights the counts.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Guinaudeau, Camille and Strube, Michael
Method
A fundamental assumption underlying our model is that this bipartite graph contains the entity transition information needed for local coherence computation, rendering feature vectors and learning phase unnecessary.
The Entity Grid Model
To make this representation accessible to machine learning algorithms, Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences.
The Entity Grid Model
(2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008).
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Specia, Lucia
Gaussian Process Regression
In our regression task3 the data consists of n pairs D = {(xi,yi)}, where x,- 6 RF is a F-dimensional feature vector and y,- E R is the response variable.
Gaussian Process Regression
Each instance is a translation and the feature vector encodes its linguistic features; the response variable is a numerical quality judgement: post editing time or likert score.
Gaussian Process Regression
GP regression assumes the presence of a latent function, f : RF —> R, which maps from the input space of feature vectors x to a scalar.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Choi, Jinho D. and McCallum, Andrew
Selectional branching
For each parsing state sij, a prediction is made by generating a feature vector xij E X , feeding it into a classifier C1 that uses a feature map (19(53, y) and a weight vector w to measure a score for each label y E y, and choosing a label with the highest score.
Selectional branching
During training, a training instance is generated for each parsing state sij by taking a feature vector xij and its true label yij.
Selectional branching
Then, a subgradient is measured by taking all feature vectors together weighted by Q (line 6).
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Introduction
We tabulate the transitions of entities between different syntactic positions (or their nonoccurrence) in sentences, and convert the frequencies of transitions into a feature vector representation of transition probabilities in the document.
Introduction
We solve this problem in a supervised machine learning setting, where the input is the feature vector representations of the two versions of the document, and the output is a binary value indicating the document with the original sentence ordering.
Introduction
Transition length — the maximum length of the transitions used in the feature vector representation of a document.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: