Abstract | In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domain-general lexical disambiguation tasks. |
Abstract | We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. |
Distributional Semantic Hidden Markov Models | Unlike in most applications of HMMs in text processing, in which the representation of a token is simply its word or lemma identity, tokens in DSHMM are also associated with a vector representation of their meaning in context according to a distributional semantic model (Section 3.1). |
Introduction | By contrast, distributional semantic models are trained on large, domain-general corpora. |
Introduction | In this paper, we propose to inject contextualized distributional semantic vectors into generative probabilistic models, in order to combine their complementary strengths for domain modelling. |
Introduction | There are a number of potential advantages that distributional semantic models offer. |
Related Work | Our work is similar in that we assume much of the same structure within a domain and consequently in the model as well (Section 3), but whereas PROFINDER focuses on finding the “correct” number of frames, events, and slots with a nonparametric method, this work focuses on integrating global knowledge in the form of distributional semantics into a probabilistic model. |
Abstract | This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. |
Abstract | Our results constitute a novel evaluation of the proposed composition methods, in which the full additive model achieves the best performance, and demonstrate the usefulness of a compositional morphology component in distributional semantics . |
Composition methods | Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words). |
Composition methods | Since the very inception of distributional semantics , there have been attempts to compose meanings for sentences and larger passages (Landauer and Dumais, 1997), but interest in compositional DSMs has skyrocketed in the last few years, particularly since the influential work of Mitchell and Lapata (2008; 2009; 2010). |
Experimental setup | 4.2 Distributional semantic space6 |
Experimental setup | This result is of practical importance for distributional semantics , as it paves the way to address one of the main causes of data sparseness, and it confirms the usefulness of the compositional approach in a new domain. |
Experimental setup | We would also like to apply composition to inflectional morphology (that currently lies outside the scope of distributional semantics ), to capture the nuances of meaning that, for example, distinguish singular and plural nouns (consider, e. g., the difference between the mass singular tea and the plural teas, which coerces the noun into a count interpretation (Katz and Zamparelli, 2012)). |
Introduction | Distributional semantic models (DSMs) in particular represent the meaning of a word by a vector, the dimensions of which encode corpus-extracted co-occurrence statistics, under the assumption that words that are semantically similar will occur in similar contexts (Tumey and Pantel, 2010). |
Introduction | Compositional distributional semantic models (cDSMs) of word units aim at handling, compositionally, the high productivity of phrases and consequent data sparseness. |
Related work | Our system, given re- and build, predicts the ( distributional semantic ) meaning of rebuild. |
Related work | Another emerging line of research uses distributional semantics to model human intuitions about the semantic transparency of morphologically derived or compound expressions and how these impact various lexical processing tasks (Kuperman, 2009; Wang et al., 2012). |
Abstract | We introduce the problem of generation in distributional semantics : Given a distributional vector representing some meaning, how can we generate the phrase that best expresses that meaning? |
Conclusion | In this paper we have outlined a framework for the task of generation with distributional semantic models. |
Conclusion | From a more theoretical point of view, our work fills an important gap in distributional semantics , making it a bidirectional theory of the connection between language and meaning. |
Conclusion | Some research has already established a connection between neural and distributional semantic vector spaces (Mitchell et al., 2008; Murphy et al., 2012). |
Introduction | If distributional semantics is to be considered a proper semantic theory, then it must deal not only with synthesis (going from words to vectors), but also with generation (from vectors to words). |
Introduction | Distributional semantics assumes a lexicon of atomic expressions (that, for simplicity, we take to be words), each associated to a vector. |
Introduction | this paper, we introduce a more direct approach to phrase generation, inspired by the work in compositional distributional semantics . |
Noun phrase generation | Compositional distributional semantic systems are often evaluated on phrase and sentence paraphrasing data sets (Bla-coe and Lapata, 2012; Mitchell and Lapata, 2010; Socher et al., 2011; Tumey, 2012). |
Noun phrase generation | This is a much more challenging task and it paves the way to more realistic applications of distributional semantics in generation scenarios. |
Related work | To the best of our knowledge, we are the first to explicitly and systematically pursue the generation problem in distributional semantics . |
Related work | They introduce a bidirectional language-to-meaning model for compositional distributional semantics that is similar in spirit to ours. |
Abstract | Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. |
Abstract | Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. |
Conclusion | As seasoned distributional semanticists with thorough experience in developing and using count vectors, we set out to conduct this study because we were annoyed by the triumphalist overtones often surrounding predict models, despite the almost complete lack of a proper comparison to count vectors. |
Conclusion | To give just one last example, distributional semanticists have looked at whether certain properties of vectors reflect semantic relations in the expected way: e.g., whether the vectors of hypemyms “distribution-ally include” the vectors of hyponyms in some mathematical precise sense. |
Conclusion | Does all of this even matter, or are we on the cusp of discovering radically new ways to tackle the same problems that have been approached as we just sketched in traditional distributional semantics ? |
Evaluation materials | The ESSLLI 2008 Distributional Semantic Workshop shared-task set (esslli) contains 44 concepts to be clustered into 6 categories (Baroni et al., 2008) (we ignore here the 3- and 2-way higher-level partitions coming with this set). |
Introduction | In concrete, distributional semantic models (DSMs) use vectors that keep track of the contexts (e.g., co-occurring words) in which target terms appear in a large corpus as proxies for meaning representations, and apply geometric techniques to these vectors to measure the similarity in meaning of the corresponding words (Clark, 2013; Erk, 2012; Turney and Pantel, 2010). |
Abstract | Traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lex-emes and complexities of modeling semantic composition when dealing with structures larger than single lexical items. |
Abstract | In this work, we present a frequency-driven paradigm for robust distributional semantics in terms of semantically cohesive lineal constituents, or motifs. |
Introduction | In particular, such a perspective can be especially advantageous for distributional semantics for reasons we outline below. |
Introduction | Distributional semantic models (DSMs) that represent words as distributions over neighbouring contexts have been particularly effective in capturing fine-grained lexical semantics (Tumey et al., 2010). |
Introduction | In this section, we define our frequency-driven framework for distributional semantics in detail. |
Abstract | To our knowledge, this is the first work to address lexical relations between MWPs of varying degrees of compositionality within distributional semantics . |
Background and Related Work | Compositional Distributional Semantics . |
Background and Related Work | Several works have used compositional distributional semantics (CDS) representations to assess the compositionality of MWEs, such as noun compounds (Reddy et al., 2011) or verb-noun combinations (Kiela and Clark, 2013). |
Discussion | Much recent work subsumed under the title Compositional Distributional Semantics addressed the distributional representation of multi-word phrases (see Section 2). |
Introduction | This work addresses the modelling of MWPs within the context of distributional semantics (Tur-ney and Pantel, 2010), in which predicates are represented through the distribution of arguments they may take. |
Introduction | To our knowledge, this is the first work to address lexical relations between MWPs of varying degrees of compositionality within distributional semantics . |
Abstract | This paper describes an application of distributional semantics to the study of syntactic productivity in diachrony, i.e., the property of grammatical constructions to attract new lexical items over time. |
Abstract | By providing an empirical measure of semantic similarity between words derived from lexical co-occurrences, distributional semantics not only reliably captures how the verbs in the distribution of a construction are related, but also enables the use of visualization techniques and statistical modeling to analyze the semantic development of a construction over time and identify the semantic determinants of syntactic productivity in naturally occurring data. |
Application of the vector-space model | With the quantification of semantic similarity provided by the distributional semantic model, it is also possible to properly test the hypothesis that productivity is tied to the structure of the semantic space. |
Conclusion | Not only does distributional semantics provide an empirically-based measure of semantic similarity that appropriately captures semantic distinctions, it also enables the use of methods for which quantification is necessary, such as data visualization and statistical analysis. |
Distributional measure of semantic similarity | One benefit of the distributional semantics approach is that it allows semantic similarity between words to be quantified by measuring the similarity in their distribution. |
Introduction | On the basis of a case study of the construction “V the hell out of NP”, I show how distributional semantics can profitably be applied to the study of syntactic productivity. |
Experimental Setup | For constructing the text-based vectors, we follow a standard pipeline in distributional semantics (Turney and Pantel, 2010) without tuning its parameters and collect co-occurrence statistics from the concatenation of ukWaC4 and the Wikipedia, amounting to 2.7 billion tokens in total. |
Experimental Setup | Singular Value Decomposition (SVD) SVD is the most widely used dimensionality reduction technique in distributional semantics (Turney and Pantel, 2010), and it has recently been exploited to combine visual and linguistic dimensions in the multimodal distributional semantic model of Bruni et al. |
Experimental Setup | The cosine has been widely used in the distributional semantic literature, and it has been shown to outperform Euclidean distance (Bullinaria and Levy, 2007).7 Parameters were estimated with standard backpropagation and L—BFGS. |
Results | For the SVD model, we set the number of dimensions to 300, a common choice in distributional semantics , coherent with the settings we used for the visual and linguistic spaces. |
Zero-shot learning and fast mapping | Concretely, we assume that concepts, denoted for convenience by word labels, are represented in linguistic terms by vectors in a text-based distributional semantic space (see Section 4.3). |
Background | 2.2 Distributional Semantics |
Background | Distributional semantic knowledge is then encoded as weighted inference rules in the MLN. |
PSL for STS | Given the logical forms for a pair of sentences, a text T and a hypothesis H, and given a set of weighted rules derived from the distributional semantics (as explained in section 2.6) composing the knowledge base KB, we build a PSL model that supports determining the truth value of H in the most probable interpretation (i.e. |
PSL for STS | KB: The knowledge base is a set of lexical and phrasal rules generated from distributional semantics , along with a similarity score for each rule (section 2.6). |
Conclusions and future work | In a sense this is encouraging, as it motivates our most exciting future work: augmenting this simple model to explicitly capture complementary information such as distributional semantics (Blei et al., 2003), diathesis alternations (McCarthy, 2000) and selectional preferences (0 Séaghdha, 2010). |
Conclusions and future work | By combining the syntactic classes with unsupervised POS tagging (Teichert and Daumé III, 2009) and the selectional preferences with distributional semantics (O Séaghdha, 2010), we hope to produce more accurate results on these complementary tasks while avoiding the use of any supervised learning. |
Previous work | Graphical models have been increasingly popular for a variety of tasks such as distributional semantics (Blei et al., 2003) and unsupervised POS tagging (Finkel et al., 2007), and sampling methods allow efficient estimation of full joint distributions (Neal, 1993). |