Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
Lazaridou, Angeliki and Marelli, Marco and Zamparelli, Roberto and Baroni, Marco

Article Structure

Abstract

Speakers of a language can construct an unlimited number of new words through morphological derivation.

Introduction

Effective ways to represent word meaning are needed in many branches of natural language processing.

Related work

Morphological induction systems use corpus-based methods to decide if two words are morphologically related and/or to segment words into morphemes (Dreyer and Eisner, 2011; Goldsmith, 2001; Goldwater and McClosky, 2005 ; Goldwater, 2006; Naradowsky and Goldwater, 2009; Wicentowski, 2004).

Composition methods

Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words).

Experimental setup

4.1 Morphological data

Topics

distributional semantics

Appears in 11 sentences as: Distributional semantic (3) distributional semantic (3) distributional semantics (5)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning.
    Page 1, “Abstract”
  2. Our results constitute a novel evaluation of the proposed composition methods, in which the full additive model achieves the best performance, and demonstrate the usefulness of a compositional morphology component in distributional semantics .
    Page 1, “Abstract”
  3. Distributional semantic models (DSMs) in particular represent the meaning of a word by a vector, the dimensions of which encode corpus-extracted co-occurrence statistics, under the assumption that words that are semantically similar will occur in similar contexts (Tumey and Pantel, 2010).
    Page 1, “Introduction”
  4. Compositional distributional semantic models (cDSMs) of word units aim at handling, compositionally, the high productivity of phrases and consequent data sparseness.
    Page 2, “Introduction”
  5. Our system, given re- and build, predicts the ( distributional semantic ) meaning of rebuild.
    Page 2, “Related work”
  6. Another emerging line of research uses distributional semantics to model human intuitions about the semantic transparency of morphologically derived or compound expressions and how these impact various lexical processing tasks (Kuperman, 2009; Wang et al., 2012).
    Page 2, “Related work”
  7. Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words).
    Page 3, “Composition methods”
  8. Since the very inception of distributional semantics , there have been attempts to compose meanings for sentences and larger passages (Landauer and Dumais, 1997), but interest in compositional DSMs has skyrocketed in the last few years, particularly since the influential work of Mitchell and Lapata (2008; 2009; 2010).
    Page 3, “Composition methods”
  9. 4.2 Distributional semantic space6
    Page 5, “Experimental setup”
  10. This result is of practical importance for distributional semantics , as it paves the way to address one of the main causes of data sparseness, and it confirms the usefulness of the compositional approach in a new domain.
    Page 8, “Experimental setup”
  11. We would also like to apply composition to inflectional morphology (that currently lies outside the scope of distributional semantics ), to capture the nuances of meaning that, for example, distinguish singular and plural nouns (consider, e. g., the difference between the mass singular tea and the plural teas, which coerces the noun into a count interpretation (Katz and Zamparelli, 2012)).
    Page 8, “Experimental setup”

See all papers in Proc. ACL 2013 that mention distributional semantics.

See all papers in Proc. ACL that mention distributional semantics.

Back to top.

data sparseness

Appears in 6 sentences as: data sparseness (6)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning.
    Page 1, “Abstract”
  2. Not surprisingly, there is a strong correlation between word frequency and vector quality (Bullinaria and Levy, 2007), and since most words occur only once even in very large corpora (Baroni, 2009), DSMs suffer data sparseness .
    Page 1, “Introduction”
  3. Compositional distributional semantic models (cDSMs) of word units aim at handling, compositionally, the high productivity of phrases and consequent data sparseness .
    Page 2, “Introduction”
  4. Besides alleviating data sparseness problems, a system of this sort, that automatically induces the semantic contents of morphological processes, would also be of tremendous theoretical interest, given that the semantics of derivation is a central and challenging topic in linguistic morphology (Dowty, 1979; Lieber, 2004).
    Page 2, “Introduction”
  5. Morphological induction has recently received considerable attention since morphological analysis can mitigate data sparseness in domains such as parsing and machine translation (Goldberg and Tsarfaty, 2008; Lee, 2004).
    Page 2, “Related work”
  6. This result is of practical importance for distributional semantics, as it paves the way to address one of the main causes of data sparseness , and it confirms the usefulness of the compositional approach in a new domain.
    Page 8, “Experimental setup”

See all papers in Proc. ACL 2013 that mention data sparseness.

See all papers in Proc. ACL that mention data sparseness.

Back to top.

co-occurrence

Appears in 5 sentences as: co-occurrence (5)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. Distributional semantic models (DSMs) in particular represent the meaning of a word by a vector, the dimensions of which encode corpus-extracted co-occurrence statistics, under the assumption that words that are semantically similar will occur in similar contexts (Tumey and Pantel, 2010).
    Page 1, “Introduction”
  2. Trying to represent the meaning of arbitrarily long constructions by directly collecting co-occurrence statistics is obviously ineffective and thus methods have been developed to derive the meaning of larger constructions as a function of the meaning of their constituents (Baroni and Zamparelli, 2010; Coecke et al., 2010; Mitchell and Lapata, 2008; Mitchell and Lapata, 2010; Socher et al., 2012).
    Page 2, “Introduction”
  3. Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words).
    Page 3, “Composition methods”
  4. We collect co-occurrence statistics for the top 20K content words (adjectives, adverbs, nouns, verbs)
    Page 5, “Experimental setup”
  5. Due to differences in co-occurrence weighting schemes (we use a logarithmically scaled measure, they do not), their multiplicative model is closer to our additive one.
    Page 6, “Experimental setup”

See all papers in Proc. ACL 2013 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

semantic space

Appears in 5 sentences as: semantic space (3) semantic spaces (2)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. Distributional semantic models (DSMs), also known as vector-space models, semantic spaces , or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words).
    Page 3, “Composition methods”
  2. tion is that a vector, in order to be a good representation of the meaning of the corresponding word, should lie in a region of semantic space populated by intuitively similar meanings, e. g., we are more likely to have captured the meaning of car if the NN of its vector is the automobile vector rather than potato.
    Page 4, “Experimental setup”
  3. All 900 derived vectors from the test set were matched with their three closest NNs in our semantic space (see Section 4.2), thus producing a set of 2, 700 word pairs.
    Page 4, “Experimental setup”
  4. 6Most steps of the semantic space construction and composition pipelines were implemented using
    Page 5, “Experimental setup”
  5. These settings are chosen without tuning, and are based on previous experiments where they produced high-quality semantic spaces (Boleda et al., 2013; Bullinaria and Levy, 2007).
    Page 5, “Experimental setup”

See all papers in Proc. ACL 2013 that mention semantic space.

See all papers in Proc. ACL that mention semantic space.

Back to top.

vectors representing

Appears in 5 sentences as: vector representation (2) vectors representing (3)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. Although these works exploit vectors representing complex forms, they do not attempt to generate them compositionally.
    Page 2, “Related work”
  2. Annotation of quality of test vectors The quality of the corpus-based vectors representing derived test items was determined by collecting human semantic similarity judgments in a crowdsourcing survey.
    Page 4, “Experimental setup”
  3. The first experiment investigates to what extent composition models can approximate high-quality (HQ) corpus-extracted vectors representing derived forms.
    Page 6, “Experimental setup”
  4. Lexfunc provides a flexible way to account for affixation, since it models it directly as a function mapping from and onto word vectors, without requiring a vector representation of bound affixes.
    Page 6, “Experimental setup”
  5. The falladd method requires a vector representation for bound morphemes.
    Page 8, “Experimental setup”

See all papers in Proc. ACL 2013 that mention vectors representing.

See all papers in Proc. ACL that mention vectors representing.

Back to top.

distributional representations

Appears in 4 sentences as: distributional representation (1) distributional representations (3)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. It is natural to hypothesize that the same methods can be applied to morphology to derive the meaning of complex words from the meaning of their parts: For example, instead of harvesting a rebuild vector directly from the corpus, the latter could be constructed from the distributional representations of re- and build.
    Page 2, “Introduction”
  2. We adapt a number of composition methods from the literature to the morphological setting, and we show that some of these methods can provide better distributional representations of derived forms than either those directly harvested from a large corpus, or those obtained by using the stem as a proxy to derived-form meaning.
    Page 2, “Introduction”
  3. Our goal is to automatically construct, given distributional representations of stems and affixes, semantic representations for the derived words containing those stems and affixes.
    Page 2, “Related work”
  4. (2010) take inspiration from formal semantics to characterize composition in terms of function application, where the distributional representation of one element in a composition (the functor) is not a vector but a function.
    Page 3, “Composition methods”

See all papers in Proc. ACL 2013 that mention distributional representations.

See all papers in Proc. ACL that mention distributional representations.

Back to top.

significantly outperform

Appears in 4 sentences as: significantly outperform (4) significantly outperforming (1)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. fulladd and lexfunc significantly outperform stem also in the HR subset (p<.001).
    Page 7, “Experimental setup”
  2. However, the stemploitation hypothesis is dispelled by the observation that both models significantly outperform the stem baseline (p<.001), despite the fact that the latter, again, has good performance, significantly outperforming the corpus-derived vectors (p < .001).
    Page 8, “Experimental setup”
  3. Indeed, if we focus on the third row of Table 5, reporting performance on low stemderived relatedness (LR) items (annotated as described in Section 4.1), fulladd and wadd still significantly outperform the corpus representations (p< .001), whereas the quality of the stem representations of LR items is not significantly different form that of the corpus-derived ones.
    Page 8, “Experimental setup”
  4. Interestingly, lexfunc displays the smallest drop in performance when restricting evaluation to LR items; however, since it does not significantly outperform the LQ corpus representations, this is arguably due to a floor effect.
    Page 8, “Experimental setup”

See all papers in Proc. ACL 2013 that mention significantly outperform.

See all papers in Proc. ACL that mention significantly outperform.

Back to top.

semantic representations

Appears in 3 sentences as: Semantic representations (1) semantic representations (2)
In Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
  1. Semantic representations constructed in this way beat a strong baseline and can be of higher quality than representations directly constructed from corpus data.
    Page 1, “Abstract”
  2. Our goal is to automatically construct, given distributional representations of stems and affixes, semantic representations for the derived words containing those stems and affixes.
    Page 2, “Related work”
  3. A natural extension of our research is to address morpheme composition and morphological induction jointly, trying to model the intuition that good candidate morphemes should have coherent semantic representations .
    Page 8, “Experimental setup”

See all papers in Proc. ACL 2013 that mention semantic representations.

See all papers in Proc. ACL that mention semantic representations.

Back to top.