Index of papers in Proc. ACL that mention
  • distributional representations
Huang, Fei and Yates, Alexander
Abstract
We demonstrate that distributional representations of word types, trained on unannotated text, can be used to improve performance on rare words.
Introduction
We investigate the use of distributional representations , which model the probability distribution of a word’s context, as techniques for finding smoothed representations of word sequences.
Introduction
That is, we use the distributional representations to share information across unannotated examples of the same word type.
Introduction
We then compute features of the distributional representations , and provide them as input to our supervised sequence labelers.
Smoothing Natural Language Sequences
Importantly, we seek distributional representations that will provide features that are common in both training and test data, to avoid data sparsity.
Smoothing Natural Language Sequences
In the next three sections, we develop three techniques for smoothing text using distributional representations .
Smoothing Natural Language Sequences
This gives greater weight to words with more idiosyncratic distributions and may improve the informativeness of a distributional representation .
distributional representations is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Abstract
We propose a novel approach that integrates the distributional representation of multiple subsets of the MWP’s words.
Background and Related Work
While previous work focused either on improving the quality of the distributional representations themselves or on their incorporation into more elaborate systems, we focus on the integration of the distributional representation of multiple LCs to improve the identification of inference relations between MWPs.
Background and Related Work
Much work in recent years has concentrated on the relation between the distributional representations of composite phrases and the representations of their component subparts (Widdows, 2008; Mitchell and Lapata, 2010; Baroni and Zampar—elli, 2010; Coecke et al., 2010).
Background and Related Work
Despite significant advances, previous work has mostly been concerned with highly compositional cases and does not address the distributional representation of predicates of varying degrees of compositionality.
Conclusion
We have presented a novel approach to the distributional representation of multi-word predicates.
Discussion
Much recent work subsumed under the title Compositional Distributional Semantics addressed the distributional representation of multi-word phrases (see Section 2).
Discussion
A standard approach in CD8 is to compose distributional representations by taking their vector sum 2),; 2 211 + 212... + on and ’UR = 2/1 + + vjn (Mitchell and Lapata, 2010).
Introduction
This heterogeneity of “take” is likely to have a negative effect on downstream systems that use its distributional representation .
Introduction
For instance, while “take” and “accept” are often considered lexically similar, the high frequency in which “take” participates in non-compositional MWPs is likely to push the two verbs’ distributional representations apart.
Introduction
This approach allows the classifier that uses the distributional representations to take into account the most relevant LCs in order to make the prediction.
Our Proposal: A Latent LC Approach
We propose a method for addressing MWPs of varying degrees of compositionality through the integration of the distributional representation of multiple subsets of the predicate’s words (LCs).
distributional representations is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Experiments
(2012), learning distributed representations on the Europarl corpus and evaluating on documents from the Reuters RCVIRCV2 corpora.
Experiments
We use the training data of the corpus to learn distributed representations across 12 languages.
Experiments
In a third evaluation (Table 4), we apply the embeddings learnt with out models to a monolingual classification task, enabling us to compare with prior work on distributed representation learning.
Introduction
Distributed representations of words provide the basis for many state-of-the-art approaches to various problems in natural language processing today.
Overview
Distributed representation learning describes the task of learning continuous representations for discrete objects.
Overview
Such distributed representations allow a model to share meaning between similar words, and have been used to capture semantic, syntactic and morphological content (Collobert and Weston, 2008; Turian et al., 2010, inter alia).
Overview
Some work has exploited this idea for transferring linguistic knowledge into low-resource languages or to learn distributed representations at the word level (Klementiev et al., 2012; Zou et al., 2013; Lauly et al., 2013, inter alia).
Related Work
Distributed Representations Distributed representations can be learned through a number of approaches.
Related Work
Tasks, where the use of distributed representations has resulted in improvements include topic modelling (Blei et al., 2003) or named entity recognition (Turian et al., 2010; Collobert et al., 2011).
Related Work
Multilingual Representation Learning Most research on distributed representation induction has focused on single languages.
distributional representations is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Distributed representations
Another approach to word representation is to learn a distributed representation .
Distributed representations
(Not to be confused with distributional representations .)
Distributed representations
A distributed representation is dense, low-dimensional, and real-valued.
Distributional representations
LSA (Dumais et al., 1988; Landauer et al., 1998), LSI, and LDA (Blei et al., 2003) induce distributional representations over F in which each column is a document context.
Distributional representations
However, like all the works cited above, Sahlgren (2006) only uses distributional representation to improve existing systems for one-shot classification tasks, such as IR, WSD, semantic knowledge tests, and text categorization.
Distributional representations
Previous research has achieved repeated successes on these tasks using clustering representations (Section 3) and distributed representations (Section 4), so we focus on these representations in our work.
Supervised evaluation tasks
We apply clustering and distributed representations to NER and chunking, which allows us to compare our semi-supervised models to those of Ando and Zhang (2005) and Suzuki and Isozaki (2008).
distributional representations is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Background
Distributional representations encode an expression by its environment, assuming the context-dependent nature of meaning according to which one “shall know a word by the company it keeps” (Firth, 1957).
Background
Distributional representations are frequently used to encode single words as vectors.
Background
While it is theoretically possible to apply the same mechanism to larger expressions, sparsity prevents learning meaningful distributional representations for expressions much larger than single words.2
Introduction
While distributional semantics is easily applied to single words, sparsity implies that attempts to directly extract distributional representations for larger expressions are doomed to fail.
distributional representations is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan
Constraints on Inter-Domain Variability
As we discussed in the introduction, our goal is to provide a method for domain adaptation based on semi-supervised learning of models with distributed representations .
Discussion and Conclusions
In this paper we presented a domain-adaptation method based on semi-supervised learning with distributed representations coupled with constraints favoring domain-independence of modeled phenomena.
Introduction
Such LVMs can be regarded as composed of two parts: a mapping from initial (normally, word-based) representation to a new shared distributed representation , and also a classifier in this representation.
Related Work
Semi-supervised leam-ing with distributed representations and its application to domain adaptation has previously been considered in (Huang and Yates, 2009), but no attempt has been made to address problems specific to the domain-adaptation setting.
The Latent Variable Model
The adaptation method advocated in this paper is applicable to any joint probabilistic model which uses distributed representations , i.e.
distributional representations is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Pei, Wenzhe and Ge, Tao and Chang, Baobao
Conventional Neural Network
The idea of distributed representation for symbolic data is one of the most important reasons why the neural network works.
Experiment
Wang and Manning (2013) conduct an empirical study on the effect of nonlinearity and the results suggest that nonlinear models are highly effective only when distributed representation is used.
Experiment
To explain why distributed representation captures more information than discrete features, we show in Table 4 the effect of character embeddings which are obtained from the lookup table of MMTNN after training.
Experiment
Therefore, compared with discrete feature representations, distributed representation can capture the syntactic and semantic similarity between characters.
Max-Margin Tensor Neural Network
To better model the tag-tag interaction given the context characters, distributed representation for tags instead of traditional discrete symbolic representation is used in our model.
distributional representations is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
Notable among the most effective distributional representations are the recent deep-learning approaches by Socher et al.
Introduction
With such a working definition, contiguous motifs are likely to make distributional representations less noisy and also assist in disambiguating context.
Introduction
Also, the lack of specificity ensures that such motifs are common enough to meaningfully influence distributional representation beyond single tokens.
distributional representations is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Marelli, Marco and Zamparelli, Roberto and Baroni, Marco
Composition methods
(2010) take inspiration from formal semantics to characterize composition in terms of function application, where the distributional representation of one element in a composition (the functor) is not a vector but a function.
Introduction
It is natural to hypothesize that the same methods can be applied to morphology to derive the meaning of complex words from the meaning of their parts: For example, instead of harvesting a rebuild vector directly from the corpus, the latter could be constructed from the distributional representations of re- and build.
Introduction
We adapt a number of composition methods from the literature to the morphological setting, and we show that some of these methods can provide better distributional representations of derived forms than either those directly harvested from a large corpus, or those obtained by using the stem as a proxy to derived-form meaning.
Related work
Our goal is to automatically construct, given distributional representations of stems and affixes, semantic representations for the derived words containing those stems and affixes.
distributional representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Introduction
Distributional representations of words have been successfully used in many language processing tasks such as entity set expansion (Pantel et al., 2009), part-of-speech (POS) tagging and chunking (Huang and Yates, 2009), ontology learning (Curran, 2005), computing semantic textual similarity (Besancon et al., 1999), and lexical inference (Kotlerman et al., 2012).
Introduction
Consequently, the distributional representations of the word lightweight will differ considerably between the two domains.
Introduction
The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space, thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step.
O \
We first create a distributional representation for a word using the data from a single domain, and then learn a Partial Least Square Regression (PLSR) model to predict the distribution of a word in a target domain given its distribution in a source domain.
distributional representations is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ferret, Olivier
Improving a distributional thesaurus
The principles presented in the previous section face one major problem compared to the “classical” distributional approach : the semantic similarity of two words can be evaluated directly by computing the similarity of their distributional representations .
Principles
As a result, the distributional representation of a word takes the unstructured form of a bag of words or the more structured form of a set of pairs {syntactic relation, word}.
Principles
A variant of this approach was proposed in (Kazama et al., 2010) where the distributional representation of a word is modeled as a multinomial distribution with Dirichlet as prior.
distributional representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman
Abstract
We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings.
Introduction
Distributed representations of words have proved useful for a number of tasks.
Overview
A word embedding is a distributed representation of meaning where each word is represented as a vector in R”.
distributional representations is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: