Index of papers in Proc. ACL that mention
  • vector representations
Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing
Abstract
We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.
Bilingually-constrained Recursive Auto-encoders
3.1.1 Word Vector Representations
Bilingually-constrained Recursive Auto-encoders
In phrase embedding using composition, the word vector representation is the basis and serves as the input to the neural network.
Bilingually-constrained Recursive Auto-encoders
Given a phrase which is an ordered list of m words, each word has an index i into the columns of the embedding matrix L. The index i is used to retrieve the word’s vector representation using a simple multiplication with a binary vector 6 which is zero in all positions except for the ith index:
Introduction
embedding, which converts a word into a dense, low dimensional, real-valued vector representation (Bengio et al., 2003; Bengio et al., 2006; Collobert and Weston, 2008; Mikolov et al., 2013).
Introduction
Therefore, in order to successfully apply DNN to model the whole translation process, such as modelling the decoding process, learning compact vector representations for the basic phrasal translation units is the essential and fundamental work.
Related Work
In contrast, our method attempts to learn the semantic vector representation for any phrase.
vector representations is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Thater, Stefan and Fürstenau, Hagen and Pinkal, Manfred
Conclusion
We have presented a novel method for adapting the vector representations of words according to their context.
Introduction
Second, the vectors of two syntactically related words, e. g., a target verb acquire and its direct object knowledge, typically have different syntactic environments, which implies that their vector representations encode complementary information and there is no direct way of combining the information encoded in the respective vectors.
Introduction
To solve these problems, we build upon previous work (Thater et al., 2009) and propose to use syntactic second-order vector representations .
Introduction
Second-order vector representations in a bag-of-words setting were first used by Schutze (1998); in a syntactic setting, they also feature in Dligach and Palmer (2008).
Related Work
Several approaches to contextualize vector representations of word meaning have been proposed.
Related Work
By using vector representations of a predicate p and an argument a, Kintsch identifies words
Related Work
Mitchell and Lapata (2008), henceforth M&L, propose a general framework in which meaning representations for complex expressions are computed compositionally by combining the vector representations of the individual words of the complex expression.
The model
In this section, we present our method of contextualizing semantic vector representations .
The model
Our model employs vector representations for words and expressions containing syntax-specific first and second order co-occurrences information.
The model
The basis for the construction of both kinds of vector representations are co-occurrence graphs.
vector representations is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Socher, Richard and Bauer, John and Manning, Christopher D. and Andrew Y., Ng
Abstract
Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations .
Introduction
Previous RNN-based parsers used the same (tied) weights at all nodes to compute the vector representing a constituent (Socher et al., 2011b).
Introduction
Therefore we combine syntactic and semantic information by giving the parser access to rich syntactico-semantic information in the form of distributional word vectors and compute compositional semantic vector representations for longer phrases (Costa et al., 2003; Menchetti et al., 2005; Socher et al., 2011b).
Introduction
We will first briefly introduce single word vector representations and then describe the CVG objective function, tree scoring and inference.
vector representations is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Dinu, Georgiana and Baroni, Marco
Abstract
We introduce the problem of generation in distributional semantics: Given a distributional vector representing some meaning, how can we generate the phrase that best expresses that meaning?
Evaluation setting
Construction of vector spaces We test two types of vector representations .
Evaluation setting
(2013a) learns vector representations using a neural network architecture by trying to predict a target word given the words surrounding it.
Evaluation setting
(2014) for an extensive comparison of the two types of vector representations .
General framework
To construct the vector representing a two-word phrase, we must compose the vectors associated to the input words.
General framework
where {i and 27 are the vector representations associated to words u and v. fcompR : Rd >< Rd —> Rd (for d the dimensionality of vectors) is a composition function specific to the syntactic relation R holding between the two words.1
Introduction
For example, given the vectors representing red and car, composition derives a vector that approximates the meaning of red car.
Introduction
We can, for example, synthesize the vector representing the meaning of a phrase or sentence, and then generate alternative phrases or sentences from this vector to accomplish true paraphrase generation (as opposed to paraphrase detection or ranking of candidate paraphrases).
Introduction
Given a vector representing an image, generation can be used to productively construct phrases or sentences that describe the image (as opposed to simply retrieving an existing description from a set of candidates).
vector representations is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi
Experimental Setup
To add auxiliary word vector representations , we use the publicly available word vectors (Cirik
Introduction
Even in the case of first-order parsers, this results in a high-dimensional vector representation of each arc.
Introduction
participating in an arc, such as continuous vector representations of words.
Introduction
Finally, we demonstrate that the model can successfully leverage word vector representations , in contrast to the baselines.
Problem Formulation
Specifically, U gbh (for a given sentence, suppressed) is an 7“ dimensional vector representation of the word corresponding to h as a head word.
Related Work
Traditionally, these vector representations have been derived primarily from co-occurrences of words within sentences, ignoring syntactic roles of the co-occurring words.
Related Work
While this method learns to map word combinations into vectors, it builds on existing word-level vector representations .
vector representations is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher
Experiments
Given a query word 21) and another word 21/ we obtain their vector representations gbw and gwa, and evaluate their cosine similarity as 8(gbw, gwa) = By assessing the similarity of 212 with all other words 212’, we can find the words deemed most similar by the model.
Introduction
This component of the model uses the vector representation of words to predict the sentiment annotations on contexts in which the words appear.
Introduction
This causes words expressing similar sentiment to have similar vector representations .
Our Model
The energy function uses a word representation matrix R E R“ X M) where each word 21) (represented as a one-on vector) in the vocabulary V has a 6-dimensional vector representation gbw = Rw corresponding to that word’s column in R. The random variable 6 is also a B-dimensional vector, 6 E R5 which weights each of the 6 dimensions of words’ representation vectors.
Related work
For each latent topic T, the model learns a conditional distribution p(w|T) for the probability that word 21) occurs in T. One can obtain a k:-dimensional vector representation of words by first training a k-topic model and then filling the matrix with the p(w|T) values (normalized to unit length).
vector representations is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Marelli, Marco and Zamparelli, Roberto and Baroni, Marco
Experimental setup
Annotation of quality of test vectors The quality of the corpus-based vectors representing derived test items was determined by collecting human semantic similarity judgments in a crowdsourcing survey.
Experimental setup
The first experiment investigates to what extent composition models can approximate high-quality (HQ) corpus-extracted vectors representing derived forms.
Experimental setup
Lexfunc provides a flexible way to account for affixation, since it models it directly as a function mapping from and onto word vectors, without requiring a vector representation of bound affixes.
Related work
Although these works exploit vectors representing complex forms, they do not attempt to generate them compositionally.
vector representations is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Dong, Li and Wei, Furu and Tan, Chuanqi and Tang, Duyu and Zhou, Ming and Xu, Ke
Our Approach
The computation process is conducted in a bottom-up manner, and the vector representations are computed recursively.
RNN: Recursive Neural Network
It performs compositions based on the binary trees, and obtain the vector representations in a bottom-up way.
RNN: Recursive Neural Network
The vector representation v is obtained via:
RNN: Recursive Neural Network
The vector representation of root node is then fed into a softmax classifier to predict the label.
vector representations is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Liang, Percy
Model overview
Our framework accommodates any paraphrasing method, and in this paper we propose an association model that learns to associate natural language phrases that co-occur frequently in a monolingual parallel corpus, combined with a vector space model, which learns to score the similarity between vector representations of natural language utterances (Section 5).
Paraphrasing
We now introduce a vector space (VS) model, which assigns a vector representation for each utterance, and learns a scoring function that ranks paraphrase candidates.
Paraphrasing
We start by constructing vector representations of words.
Paraphrasing
We can now estimate a paraphrase score for two utterances cc and 0 via a weighted combination of the components of the vector representations:
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Lapata, Mirella
Autoencoders for Grounded Semantics
The target vector is the sum of X0) and the centroid X0) of the remaining attribute vectors representing object 0.
Experimental Setup
As shown in Figure 1, our model takes as input two (real-valued) vectors representing the visual and textual modalities.
Experimental Setup
respond to words and edges to cosine similarity scores between vectors representing their meaning.
Results
Table 6 shows examples of clusters produced by Chinese Whispers when using vector representations provided by the SAE model.
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Paperno, Denis and Pham, Nghia The and Baroni, Marco
Compositional distributional semantics
If distributional vectors encode certain aspects of word meaning, it is natural to expect that similar aspects of sentence meaning can also receive vector representations , obtained compositionally from word vectors.
Compositional distributional semantics
Deverbal nouns like demolition, often used without mention of who demolished what, would have to get vector representations while the corresponding verbs (demolish) would become tensors, which makes immediately related verbs and nouns incomparable.
The practical lexical function model
The matrices formalize argument slot saturation, operating on an argument vector representation through matrix by vector multiplication, as described in the next section.
The practical lexical function model
This flexibility makes our model suitable to compute vector representations of sentences without stumbling at unseen syntactic usages of words.
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Bruni, Elia and Baroni, Marco
Introduction
In this paper, we rely on the same image analysis techniques but instead focus on the reference problem: We do not aim at enriching word representations with visual information, although this might be a side effect of our approach, but we address the issue of automatically mapping objects, as depicted in images, to the context vectors representing the corresponding words.
Introduction
We show that the induced cross-modal semantic space is powerful enough that sensible guesses about the correct word denoting an object can be made, even when the linguistic context vector representing the word has been created from as little as 1 sentence containing it.
Introduction
First, we conduct experiments with simple image-and text-based vector representations and compare alternative methods to perform cross-modal mapping.
Related Work
(2013) use linear regression to transform vector-based image representations onto vectors representing the same concepts in linguistic semantic space.
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Paşca, Marius and Van Durme, Benjamin
Extraction from Documents and Queries
2) construction of internal search-signature vector representations for each candidate attribute, based
Extraction from Documents and Queries
3) construction of a reference internal search-signature vector representation for a small set of seed attributes provided as input.
Extraction from Documents and Queries
4) ranking of candidate attributes with respect to each class (e.g., movies), by computing similarity scores between their individual vector representations and the reference vector of the seed attributes.
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Distributional Semantic Hidden Markov Models
Unlike in most applications of HMMs in text processing, in which the representation of a token is simply its word or lemma identity, tokens in DSHMM are also associated with a vector representation of their meaning in context according to a distributional semantic model (Section 3.1).
Distributional Semantic Hidden Markov Models
All the methods below start from this basic vector representation .
Distributional Semantic Hidden Markov Models
Let event head h be the syntactic head of a number of arguments a1,a2, ...am, and 27h,27a1,27a2, ...27am be their respective vector representations according to the SIMPLE method.
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank
Integrating Semantic Constraint into Surprisal
The factor A(wn, h) is essentially based on a comparison between the vector representing the current word wn and the vector representing the prior history h. Varying the method for constructing word vectors (e. g., using LDA or a simpler semantic space model) and for combining them into a representation of the prior context h (e.g., using additive or multiplicative functions) produces distinct models of semantic composition.
Integrating Semantic Constraint into Surprisal
The calculation of A is then based on a weighted dot product of the vector representing the upcoming word w, with the vector representing the prior context h:
Models of Processing Difficulty
In this framework, the similarity between two words can be easily quantified, e.g., by measuring the cosine of the angle of the vectors representing them.
Models of Processing Difficulty
Specifically, the simpler space is based on word co-occurrence counts; it constructs the vector representing a given target word, t, by identifying all the tokens oft in a corpus and recording the counts of context words, 6, (within a specific window).
vector representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kiela, Douwe and Hill, Felix and Korhonen, Anna and Clark, Stephen
Experimental Approach
Generating Visual Representations Visual vector representations for each image were obtained using the well-known bag of visual words (BoVW) approach (Sivic and Zisserman, 2003).
Experimental Approach
BoVW obtains a vector representation for an
Experimental Approach
Generating Linguistic Representations We extract continuous vector representations (also of 50 dimensions) for concepts using the continuous log-linear skipgram model of Mikolov et al.
vector representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Andreas, Jacob and Klein, Dan
Introduction
In order to isolate the contribution from word embeddings, it is useful to demonstrate improvement over a parser that already achieves state-of-the-art performance without vector representations .
Parser extensions
gb(w) is the vector representation of the word 212, am, are per-basis weights, and 6 is an inverse radius parameter which determines the strength of the smoothing.
Parser extensions
vector representation .
vector representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Background
The recursive application of autoencoders was first introduced in Pollack (1990), whose recursive auto-associative memories learn vector representations over pre-specified recursive data structures.
Learning
The unsupervised method described so far learns a vector representation for each sentence.
Model
Their purpose is to learn semantically meaningful vector representations for sentences and phrases of variable size, while the purpose of this paper is to investigate the use of syntax and linguistic formalisms in such vector-based compositional models.
vector representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Schamoni, Shigehiko and Hieber, Felix and Sokolov, Artem and Riezler, Stefan
Experiments
LinLearn denotes model combination by overloading the vector representation of queries q and documents (1 in the VW linear learner by incorporating arbitrary ranking models as dense features.
Model Combination
This means that the vector representation of queries q and documents (1 in the VW linear learner is overloaded once more: In addition to dense domain-knowledge features, we incorporate arbitrary ranking models as dense features whose value is the score of the ranking model.
Translation and Ranking for CLIR
Optimization for these additional models including domain knowledge features was done by overloading the vector representation of queries q and documents (1 in the VW linear learner: Instead of sparse word-based features, q and d are represented by real-valued vectors of dense domain-knowledge features.
vector representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
Finally, we perform SVD on the motif similarity matrix (with size of the order of the total vocabulary in the corpus), and retain the first k principal eigenvectors to obtain low-dimensional vector representations that are more convenient to work with.
Introduction
For composing the motifs representations to get judgments on semantic similarity of sentences, we use our recent Vector Tree Kernel approach The VTK approach defines a convo-lutional kernel over graphs defined by the dependency parses of sentences, using a vector representation at each graph node that representing a single lexical token.
Introduction
For this task, we again use the VTK formalism for combining vector representations of the individual motifs.
vector representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Introduction
We tabulate the transitions of entities between different syntactic positions (or their nonoccurrence) in sentences, and convert the frequencies of transitions into a feature vector representation of transition probabilities in the document.
Introduction
We solve this problem in a supervised machine learning setting, where the input is the feature vector representations of the two versions of the document, and the output is a binary value indicating the document with the original sentence ordering.
Introduction
Transition length — the maximum length of the transitions used in the feature vector representation of a document.
vector representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: