Background | Distributional representations encode an expression by its environment, assuming the context-dependent nature of meaning according to which one “shall know a word by the company it keeps” (Firth, 1957). |
Background | Distributional representations are frequently used to encode single words as vectors. |
Background | While it is theoretically possible to apply the same mechanism to larger expressions, sparsity prevents learning meaningful distributional representations for expressions much larger than single words.2 |
Introduction | While distributional semantics is easily applied to single words, sparsity implies that attempts to directly extract distributional representations for larger expressions are doomed to fail. |
Composition methods | (2010) take inspiration from formal semantics to characterize composition in terms of function application, where the distributional representation of one element in a composition (the functor) is not a vector but a function. |
Introduction | It is natural to hypothesize that the same methods can be applied to morphology to derive the meaning of complex words from the meaning of their parts: For example, instead of harvesting a rebuild vector directly from the corpus, the latter could be constructed from the distributional representations of re- and build. |
Introduction | We adapt a number of composition methods from the literature to the morphological setting, and we show that some of these methods can provide better distributional representations of derived forms than either those directly harvested from a large corpus, or those obtained by using the stem as a proxy to derived-form meaning. |
Related work | Our goal is to automatically construct, given distributional representations of stems and affixes, semantic representations for the derived words containing those stems and affixes. |
Improving a distributional thesaurus | The principles presented in the previous section face one major problem compared to the “classical” distributional approach : the semantic similarity of two words can be evaluated directly by computing the similarity of their distributional representations . |
Principles | As a result, the distributional representation of a word takes the unstructured form of a bag of words or the more structured form of a set of pairs {syntactic relation, word}. |
Principles | A variant of this approach was proposed in (Kazama et al., 2010) where the distributional representation of a word is modeled as a multinomial distribution with Dirichlet as prior. |