Index of papers in Proc. ACL 2014 that mention
  • feature vector
Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi
Abstract
In this paper, we use tensors to map high-dimensional feature vectors into low dimensional representations.
Introduction
The exploding dimensionality of rich feature vectors must then be balanced with the difficulty of effectively learning the associated parameters from limited training data.
Introduction
We depart from this view and leverage high-dimensional feature vectors by mapping them into low dimensional representations.
Introduction
We begin by representing high-dimensional feature vectors as multi-way cross-products of smaller feature vectors that represent words and their syntactic relations (arcs).
Problem Formulation
We can alternatively specify arc features in terms of rank-l tensors by taking the Kronecker product of simpler feature vectors associated with the head (vector gbh E R”), and modifier (vector gbm E R”), as well as the arc itself (vector ohm E Rd).
Problem Formulation
Here ohm is much lower dimensional than the MST arc feature vector gbhmm discussed earlier.
Problem Formulation
By taking the crossproduct of all these component feature vectors , we obtain the full feature representation for arc h —> m as a rank-l tensor
feature vector is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Distribution Prediction
3.1 In-domain Feature Vector Construction
Distribution Prediction
3.2 Cross-Domain Feature Vector Prediction
Distribution Prediction
We model distribution prediction as a multivariate regression problem where, given a set {(109,109) £121 consisting of pairs of feature vectors selected from each domain for the pivots in W, we learn a mapping
Domain Adaptation
First, we lemmatise each word in a source domain labeled review 335;), and extract both unigrams and bigrams as features to represent mg) by a binary-valued feature vector .
Domain Adaptation
Next, we train a binary classification model, 6, using those feature vectors .
Domain Adaptation
At test time, we represent a test target review H using a binary-valued feature vector h of unigrams and bigrams of lemmas of the words in H, as we did for source domain labeled train reviews.
Related Work
The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier.
feature vector is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Severyn, Aliaksei and Moschitti, Alessandro and Uryupina, Olga and Plank, Barbara and Filippova, Katja
Conclusions and Future Work
We use standard feature vectors augmented by shallow syntactic trees enriched with additional conceptual information.
Conclusions and Future Work
This paper makes several contributions: (i) it shows that effective OM can be carried out with supervised models trained on high quality annotations; (ii) it introduces a novel annotated corpus of YouTube comments, which we make available for the research community; (iii) it defines novel structural models and kernels, which can improve on feature vectors , e.g., up to 30% of relative improvement in type classification, when little data is available, and demonstrates that the structural model scales well to other domains.
Related work
Most of the previous work on supervised sentiment analysis use feature vectors to encode documents.
Representations and models
In the next sections, we define a baseline feature vector model and a novel structural model based on kernel methods.
Representations and models
We go beyond traditional feature vectors by employing structural models (S TRUCT), which encode each comment into a shallow syntactic tree.
Representations and models
A polynomial kernel of degree 3 is applied to feature vectors (FVE C).
feature vector is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris
Methodology
Each SVO (or AN) instance will be represented by a triple (duple) from which a feature vector will be extracted.5 The vector will consist of the concatenation of the conceptual features (which we discuss below) for all participating words, and conjunction features for word pairs.6 For example, to generate the feature vector for the SVO triple (car, drink, gasoline), we compute all the features for the individual words car, drink, gasoline and combine them with the conjunction features for the pairs car drink and drink gasoline.
Methodology
6If word one is represented by features u E R” and word two by features v E Rm then the conjunction feature vector is the vectorization of the outer product uvT.
Model and Feature Extraction
Degrees of membership in different supersenses are represented by feature vectors , where each element corresponds to one supersense.
Model and Feature Extraction
For example, the top-level classes in GermaNet include: adj.feeling (e.g., willing, pleasant, cheerful); adj.sabstance (e.g., dry, ripe, creamy); adj.spatial (e.g., adjacent, gigantic).12 For each adjective type in WordNet, they produce a vector with a classifier posterior probabilities corresponding to degrees of membership of this word in one of the 13 semantic classes,13 similar to the feature vectors we build for nouns and verbs.
Model and Feature Extraction
For languages other than English, feature vectors are projected to English features using translation dictionaries.
feature vector is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Experiments & Results
For the classifier-based system, we tested various different feature vector configurations.
Experiments & Results
The various erY configurations use the same feature vector setup for all classifier experts.
Experiments & Results
The auto configuration does not uniformly apply the same feature vector setup to all classifier experts but instead seeks to find the optimal setup per classifier expert.
System
The feature vector for the classifiers represents a local context of neighbouring words, and optionally also global context keywords in a binary-valued bag-of-words configuration.
System
If not, we check for the presence of a classifier expert for the offered L1 fragment; only then we can proceed by extracting the desired number of L2 local context words to the immediate left and right of this fragment and adding those to the feature vector .
feature vector is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Pasupat, Panupong and Liang, Percy
Approach
where 6 6 Rd is the parameter vector and gb(:c, w, z) is the feature vector , which will be defined in Section 3.3.
Approach
To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z.
Approach
The final feature vector is the concatenation of structural features gbs(w,z), which consider the selected nodes in the DOM tree, and denotation features gbd(:c, y), which look at the extracted entities.
feature vector is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chaturvedi, Snigdha and Goldwasser, Dan and Daumé III, Hal
Empirical Evaluation
Each row is a category and each column represents a feature vector .
Empirical Evaluation
While the actual size of vocabulary is huge, we use only a small subset of words in our feature vector for this visualization.
Intervention Prediction Models
Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector , qb(p, 7“, h, t), is extracted for each thread and using the weight vector, w, this model defines a decision function, similar to what is shown in Equation 1.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Dong, Li and Wei, Furu and Tan, Chuanqi and Tang, Duyu and Zhou, Ming and Xu, Ke
Our Approach
,gc are the composition functions, P (9;, |vl, VT, e) is the probability of employing 9;, given the child vectors vl, VT and external feature vector e, and f is the nonlinearity function.
Our Approach
where 6 is the hyper-parameter, S E ROXQDHGI) is the matrix used to determine which composition function we use, vl, VT are the left and right child vectors, and e are external feature vector .
Our Approach
In this work, e is a one-hot binary feature vector which indicates what the dependency type is.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ritter, Alan and Hovy, Eduard
Model
encode a tweet-level feature vector rather than an aggregate one.
Model
(3) The feature vector wtwfizfje, Xi) encodes the following standard general features:
Related Work
In the user attribute extraction literature, researchers have considered neighborhood context to boost inference accuracy (Pennacchiotti and Popescu, 2011; Al Zamal et al., 2012), where information about the degree of their connectivity to their pre-labeled users is included in the feature vectors .
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong
Experiments
negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data.
Problem Statement
We introduce a feature extraction x that maps the triple (A,B,S) to its feature vector x.
Problem Statement
:1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector , yi is the corresponding label (if available).
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Le and Han, Xianpei
Introduction
Feature Vector
Introduction
where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as:
Introduction
where 6 (t1, t2) is the same indicator function as in CTK; (m, nj)is a pair of aligned nodes between 151 and t2, where m and nj are correspondingly in the same position of tree t1 and t2; E (t1, 752) is the set of all aligned node pairs; sim(n,-, nj) is the feature vector similarity between nodeni and nj, computed as the dot product between their feature vectors Fm and Fnj.
feature vector is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: