Abstract | In this paper, we use tensors to map high-dimensional feature vectors into low dimensional representations. |
Introduction | The exploding dimensionality of rich feature vectors must then be balanced with the difficulty of effectively learning the associated parameters from limited training data. |
Introduction | We depart from this view and leverage high-dimensional feature vectors by mapping them into low dimensional representations. |
Introduction | We begin by representing high-dimensional feature vectors as multi-way cross-products of smaller feature vectors that represent words and their syntactic relations (arcs). |
Problem Formulation | We can alternatively specify arc features in terms of rank-l tensors by taking the Kronecker product of simpler feature vectors associated with the head (vector gbh E R”), and modifier (vector gbm E R”), as well as the arc itself (vector ohm E Rd). |
Problem Formulation | Here ohm is much lower dimensional than the MST arc feature vector gbhmm discussed earlier. |
Problem Formulation | By taking the crossproduct of all these component feature vectors , we obtain the full feature representation for arc h —> m as a rank-l tensor |
Distribution Prediction | 3.1 In-domain Feature Vector Construction |
Distribution Prediction | 3.2 Cross-Domain Feature Vector Prediction |
Distribution Prediction | We model distribution prediction as a multivariate regression problem where, given a set {(109,109) £121 consisting of pairs of feature vectors selected from each domain for the pivots in W, we learn a mapping |
Domain Adaptation | First, we lemmatise each word in a source domain labeled review 335;), and extract both unigrams and bigrams as features to represent mg) by a binary-valued feature vector . |
Domain Adaptation | Next, we train a binary classification model, 6, using those feature vectors . |
Domain Adaptation | At test time, we represent a test target review H using a binary-valued feature vector h of unigrams and bigrams of lemmas of the words in H, as we did for source domain labeled train reviews. |
Related Work | The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier. |
Conclusions and Future Work | We use standard feature vectors augmented by shallow syntactic trees enriched with additional conceptual information. |
Conclusions and Future Work | This paper makes several contributions: (i) it shows that effective OM can be carried out with supervised models trained on high quality annotations; (ii) it introduces a novel annotated corpus of YouTube comments, which we make available for the research community; (iii) it defines novel structural models and kernels, which can improve on feature vectors , e.g., up to 30% of relative improvement in type classification, when little data is available, and demonstrates that the structural model scales well to other domains. |
Related work | Most of the previous work on supervised sentiment analysis use feature vectors to encode documents. |
Representations and models | In the next sections, we define a baseline feature vector model and a novel structural model based on kernel methods. |
Representations and models | We go beyond traditional feature vectors by employing structural models (S TRUCT), which encode each comment into a shallow syntactic tree. |
Representations and models | A polynomial kernel of degree 3 is applied to feature vectors (FVE C). |
Methodology | Each SVO (or AN) instance will be represented by a triple (duple) from which a feature vector will be extracted.5 The vector will consist of the concatenation of the conceptual features (which we discuss below) for all participating words, and conjunction features for word pairs.6 For example, to generate the feature vector for the SVO triple (car, drink, gasoline), we compute all the features for the individual words car, drink, gasoline and combine them with the conjunction features for the pairs car drink and drink gasoline. |
Methodology | 6If word one is represented by features u E R” and word two by features v E Rm then the conjunction feature vector is the vectorization of the outer product uvT. |
Model and Feature Extraction | Degrees of membership in different supersenses are represented by feature vectors , where each element corresponds to one supersense. |
Model and Feature Extraction | For example, the top-level classes in GermaNet include: adj.feeling (e.g., willing, pleasant, cheerful); adj.sabstance (e.g., dry, ripe, creamy); adj.spatial (e.g., adjacent, gigantic).12 For each adjective type in WordNet, they produce a vector with a classifier posterior probabilities corresponding to degrees of membership of this word in one of the 13 semantic classes,13 similar to the feature vectors we build for nouns and verbs. |
Model and Feature Extraction | For languages other than English, feature vectors are projected to English features using translation dictionaries. |
Experiments & Results | For the classifier-based system, we tested various different feature vector configurations. |
Experiments & Results | The various erY configurations use the same feature vector setup for all classifier experts. |
Experiments & Results | The auto configuration does not uniformly apply the same feature vector setup to all classifier experts but instead seeks to find the optimal setup per classifier expert. |
System | The feature vector for the classifiers represents a local context of neighbouring words, and optionally also global context keywords in a binary-valued bag-of-words configuration. |
System | If not, we check for the presence of a classifier expert for the offered L1 fragment; only then we can proceed by extracting the desired number of L2 local context words to the immediate left and right of this fragment and adding those to the feature vector . |
Approach | where 6 6 Rd is the parameter vector and gb(:c, w, z) is the feature vector , which will be defined in Section 3.3. |
Approach | To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z. |
Approach | The final feature vector is the concatenation of structural features gbs(w,z), which consider the selected nodes in the DOM tree, and denotation features gbd(:c, y), which look at the extracted entities. |
Empirical Evaluation | Each row is a category and each column represents a feature vector . |
Empirical Evaluation | While the actual size of vocabulary is huge, we use only a small subset of words in our feature vector for this visualization. |
Intervention Prediction Models | Assuming that p represents posts of thread 75, h represents the latent category assignments, 7“ represents the intervention decision; feature vector , qb(p, 7“, h, t), is extracted for each thread and using the weight vector, w, this model defines a decision function, similar to what is shown in Equation 1. |
Our Approach | ,gc are the composition functions, P (9;, |vl, VT, e) is the probability of employing 9;, given the child vectors vl, VT and external feature vector e, and f is the nonlinearity function. |
Our Approach | where 6 is the hyper-parameter, S E ROXQDHGI) is the matrix used to determine which composition function we use, vl, VT are the left and right child vectors, and e are external feature vector . |
Our Approach | In this work, e is a one-hot binary feature vector which indicates what the dependency type is. |
Model | encode a tweet-level feature vector rather than an aggregate one. |
Model | (3) The feature vector wtwfizfje, Xi) encodes the following standard general features: |
Related Work | In the user attribute extraction literature, researchers have considered neighborhood context to boost inference accuracy (Pennacchiotti and Popescu, 2011; Al Zamal et al., 2012), where information about the degree of their connectivity to their pre-labeled users is included in the feature vectors . |
Experiments | negative transfer from irrelevant sources by relying on similarity of feature vectors between source and target domains based on labeled and unlabeled data. |
Problem Statement | We introduce a feature extraction x that maps the triple (A,B,S) to its feature vector x. |
Problem Statement | :1 and plenty of unlabeled data Du = where n; and nu are the number of labeled and unlabeled samples respectively, x,- is the feature vector , yi is the corresponding label (if available). |
Introduction | Feature Vector |
Introduction | where Ln is its phrase label (i.e., its Treebank tag), and F7, is a feature vector which indicates the characteristics of node n, which is represented as: |
Introduction | where 6 (t1, t2) is the same indicator function as in CTK; (m, nj)is a pair of aligned nodes between 151 and t2, where m and nj are correspondingly in the same position of tree t1 and t2; E (t1, 752) is the set of all aligned node pairs; sim(n,-, nj) is the feature vector similarity between nodeni and nj, computed as the dot product between their feature vectors Fm and Fnj. |