Detailed Problem Formulation | We use discrete features, namely natural numbers, in our feature vectors , quantized by a binning process. |
Detailed Problem Formulation | The length of the feature vector may vary across parts of speech. |
Detailed Problem Formulation | Let NC denote the length of the feature vector for part of speech c, 3073; denote the time-series (mil, . |
Introduction | We associate a feature vector with each frame (detection) of each such track. |
Introduction | This feature vector can encode image features (including the identity of the particular detector that produced that detection) that correlate with object class; region color, shape, and size features that correlate with object properties; and motion features, such as linear and angular object position, velocity, and acceleration, that correlate with event properties. |
Introduction | involves computing the associated feature vector for that HMM over the detections in the tracks chosen to fill its arguments. |
The Sentence Tracker | ,qT) denote the sequence of states qt that leads to an observed track, B (D75, jt,qt,)\) denote the conditional log probability of observing the feature vector associated with the detection selected by jt among the detections D75 in frame t, given that the HMM is in state qt, and A(qt_1, qt, A) denote the log transition probability of the HMM. |
The Sentence Tracker | We further need to generalize F so that it computes the joint score of a sequence of detections, one for each track, G so that it computes the joint measure of coherence between a sequence of pairs of detections in two adjacent frames, and B so that it computes the joint conditional log probability of observing the feature vectors associated with the sequence of detections selected by jt. |
The Sentence Tracker | We further need to generalize B so that it computes the joint conditional log probability of observing the feature vectors for the detections in the tracks that are assigned to the arguments of the HMM for each word in the sentence and A so that it computes the joint log transition probability for the HMMs for all words in the sentence. |
Abstract | In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector . |
Introduction | We also integrate word embedding into the model by representing each word as a feature vector (Collobert and Weston, 2008). |
Introduction | .. ,hK(f,e,d))T is a K -dimensional feature vector defined on the tuple (f,e,d); W = (w1,w2,--- ,wK)T is a K-dimensional weight vector of h, i.e., the parameters of the model, and it can be tuned by the toolkit MERT (Och, 2003). |
Introduction | (3) as a function of a feature vector h, i.e. |
Bayesian MT Decipherment via Hash Sampling | One possible strategy is to compute similarity scores 8(Wfi, we/) between the current source word feature vector Wfi and feature vectors we/Eve for all possible candidates in the target vocabulary. |
Bayesian MT Decipherment via Hash Sampling | This makes the complexity far worse (in practice) since the dimensionality of the feature vectors d is a much higher value than Computing similarity scores alone (nai'vely) would incur O(|Ve| - d) time which is prohibitively huge since we have to do this for every token in the source language corpus. |
Feature-based representation for Source and Target | But unlike documents, here each word w is associated with a feature vector wl...wd (where wi represents the weight for the feature indexed by i) which is constructed from monolingual corpora. |
Feature-based representation for Source and Target | Unlike the target word feature vectors (which can be precomputed from the monolingual target corpus), the feature vector for every source word fj is dynamically constructed from the target translation sampled in each training iteration. |
Feature-based representation for Source and Target | ), it results in the feature representation becoming more sparse (especially for source feature vectors ) which can cause problems in efficiency as well as robustness when computing similarity against other vectors. |
Training Algorithm | (a) Generate a proposal distribution by computing the hamming distance between the feature vectors for the source word and each target translation candidate. |
Evaluation | Before the clustering process takes place, Web snippets are represented as word feature vectors . |
Evaluation | In particular, p is the size of the word feature vectors representing both Web snippets and centroids (p = 2.5), K is the number of clusters to be found (K = 2..10) and 8(Wik, 14/31) is the collocation measure integrated in the InfoSimba similarity measure. |
Introduction | On the other hand, the polythetic approach which main idea is to represent Web snippets as word feature vectors has received less attention, the only relevant work being (Osinski and Weiss, 2005). |
Introduction | feature vectors are hard to define in small collections of short text fragments (Timonen, 2013), (2) existing second-order similarity measures such as the cosine are unadapted to capture the semantic similarity between small texts, (3) Latent Semantic Analysis has evidenced inconclusive results (Osinski and Weiss, 2005) and (4) the labeling process is a surprisingly hard extra task (Carpineto et al., 2009). |
Discussion | This may be due to the smaller number of feature vectors used in the experiments (only 80, as compared to the 412 used in the previous setup). |
Discussion | Another possible reason is the fact that the acoustic and visual modalities are significantly weaker than the linguistic modality, most likely due to the fact that the feature vectors are now speaker-independent, which makes it harder to improve over the linguistic modality alone. |
Experiments and Results | In this approach, the features collected from all the multimodal streams are combined into a single feature vector , thus resulting in one vector for each utterance in the dataset which is used to make a decision about the sentiment orientation of the utterance. |
Multimodal Sentiment Analysis | The features are averaged over all the frames in an utterance, to obtain one feature vector for each utterance. |
Online Question Generation | We next describe the various features we extract for every entity and the supervised models that given this feature vector representation assess the correctness of an instantiation. |
Online Question Generation | The feature vector of each named entity was induced as described in Section 4.2.1. |
Online Question Generation | To generate features for a candidate pair, we take the two feature vectors of the two entities and induce families of pair features by comparing between the two vectors. |
Our Approach | Then, we use a uniform automatic method, which primarily consists of word labeling and feature vector generation, to generate the training data set TD 2 {(55, from these collected articles. |
Our Approach | (1) After the word labeling, each instance (word/token) is represented as a feature vector . |
Our Approach | First, we turn the team: into a word sequence and compute the feature vector for each word based on the feature definition in Section 3.1. |
Preliminaries | :10, can be represented as a feature vector according to its context. |
Learning QA Matching Models | Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector . |
Learning QA Matching Models | We consider two aggregate functions for defining the feature vectors of the whole ques-tiorflanswer pair: average and max. |
Learning QA Matching Models | (Dmaaj (Q7 3) : £11252 ij (wqa we) (2) wsEVs Together, each questiorflsentence pair is represented by a 2d-dimensional feature vector . |
Baseline parser | Here (I) (ai) represents the feature vector for the it}, action a, in state item 04. |
Improved hypotheses comparison | The significant variance in the number of actions N can have an impact on the linear separability of state items, for which the feature vectors are 21-111 (I) (ai). |
Improved hypotheses comparison | A feature vector is extracted for the IDLE action according to the final state context, in the same way as other actions. |
Improved hypotheses comparison | corresponding feature vectors have about the same sizes, and are more linearly separable. |
Selectional branching | For each parsing state sij, a prediction is made by generating a feature vector xij E X , feeding it into a classifier C1 that uses a feature map (19(53, y) and a weight vector w to measure a score for each label y E y, and choosing a label with the highest score. |
Selectional branching | During training, a training instance is generated for each parsing state sij by taking a feature vector xij and its true label yij. |
Selectional branching | Then, a subgradient is measured by taking all feature vectors together weighted by Q (line 6). |
Gaussian Process Regression | In our regression task3 the data consists of n pairs D = {(xi,yi)}, where x,- 6 RF is a F-dimensional feature vector and y,- E R is the response variable. |
Gaussian Process Regression | Each instance is a translation and the feature vector encodes its linguistic features; the response variable is a numerical quality judgement: post editing time or likert score. |
Gaussian Process Regression | GP regression assumes the presence of a latent function, f : RF —> R, which maps from the input space of feature vectors x to a scalar. |
Method | A fundamental assumption underlying our model is that this bipartite graph contains the entity transition information needed for local coherence computation, rendering feature vectors and learning phase unnecessary. |
The Entity Grid Model | To make this representation accessible to machine learning algorithms, Barzilay and Lapata (2008) compute for each document the probability of each transition and generate feature vectors representing the sentences. |
The Entity Grid Model | (2011) use discourse relations to transform the entity grid representation into a discourse role matrix that is used to generate feature vectors for machine learning algorithms similarly to Barzilay and Lapata (2008). |