A practical and linguistically-motivated approach to compositional distributional semantics
Paperno, Denis and Pham, Nghia The and Baroni, Marco

Article Structure

Abstract

Distributional semantic methods to approximate word meaning with context vectors have been very successful empirically, and the last years have seen a surge of interest in their compositional extension to phrases and sentences.

Compositional distributional semantics

The research of the last two decades has established empirically that distributional vectors for words obtained from corpus statistics can be used to represent word meaning in a variety of tasks (Turney and Pantel, 2010).

The practical lexical function model

As follows from section 1.2, it would be desirable to have a compositional distributional model that encodes function-argument relations but avoids the troublesome high-order tensor representations of the pure lexical function model, with all the practical problems that come with them.

Evaluation

3.1 Evaluation materials

Conclusion

We introduced an approach to compositional distributional semantics based on a linguistically-motivated syntax-to-semantics type mapping, but simple and flexible enough that it can produce representations of English sentences of arbitrary size and structure.

Topics

word order

Appears in 9 sentences as: word order (9)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. For instance, symmetric operations like vector addition are insensitive to syntactic structure, therefore meaning differences encoded in word order
    Page 1, “Compositional distributional semantics”
  2. (2013) at the TFDS workshop (tfds below) was specifically designed to test compositional methods for their sensitivity to word order and the semantic effect of determiners.
    Page 5, “Evaluation”
  3. The foils have high lexical overlap with the targets but very different meanings, due to different determin-ers and/or word order .
    Page 6, “Evaluation”
  4. In the tfds task, not surprisingly the add and mult models, lacking determiner representations and being order-insensitive, fail to distinguish between true paraphrases and foils (indeed, for the mult model foils are significantly closer to the targets than the paraphrases, probably because the latter have lower content word overlap than the foils, that often differ in word order and determin-ers only).
    Page 7, “Evaluation”
  5. Our plf approach is able to handle determiners and word order correctly, as demonstrated by a highly significant (p < 0.01) difference between paraphrase and foil similarity (average difference in cosine .017, standard deviation .077).
    Page 7, “Evaluation”
  6. Since determiners are handled identically under the two approaches, the culprit must be word order .
    Page 7, “Evaluation”
  7. verted arguments, and thus makes this model particularly sensitive to word order differences.
    Page 8, “Evaluation”
  8. Indeed, if we limit evaluation to those foils characterized by word order changes only, If discriminates between paraphrases and foils even more clearly, whereas the plf difference, while still significant, decreases slightly.
    Page 8, “Evaluation”
  9. Clearly, this baseline is exploiting the systematic determiner differences in the foils and, indeed, when it is evaluated on foils where only word order changes its performance is no longer significant.
    Page 8, “Evaluation”

See all papers in Proc. ACL 2014 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

sentence pairs

Appears in 6 sentences as: Sentence pairs (1) sentence pairs (6)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. (2013), features 200 sentence pairs that were rated for similarity by 43 annotators.
    Page 5, “Evaluation”
  2. We also consider a similar data set introduced by Grefenstette (2013), comprising 200 sentence pairs rated by 50 annotators.
    Page 5, “Evaluation”
  3. Evaluation is carried out by computing the Spearman correlation between the annotator similarity ratings for the sentence pairs and the cosines of the vectors produced by the various systems for the same sentence pairs .
    Page 5, “Evaluation”
  4. The msrvid data set from the SemEval-2012 Semantic Textual Similarity (STS) task (Agirre et al., 2012) consists of 750 sentence pairs that describe brief videos.
    Page 6, “Evaluation”
  5. Sentence pairs were scored for similarity by 5 subjects each.
    Page 6, “Evaluation”
  6. Following standard practice in paraphrase detection studies (e. g., Blacoe and Lapata (2012)), we use cosine similarity between sentence pairs as computed by one of our systems together with two shallow similarity cues: word overlap between the two sentences and difference in sentence length.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2014 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

syntactic contexts

Appears in 5 sentences as: syntactic context (1) syntactic contexts (4)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. Another issue is that the same or similar items that occur in different syntactic contexts are assigned different semantic types with incomparable representations.
    Page 2, “Compositional distributional semantics”
  2. Besides losing the comparability of the semantic contribution of a word across syntactic contexts , we also worsen the data sparseness issues.
    Page 2, “Compositional distributional semantics”
  3. We may still want to represent word meanings in different syntactic contexts differently, but at the same time we need to incorporate a formal connection between those representations, e.g., between the transitive and the intransitive instantiations of the verb to eat.
    Page 3, “The practical lexical function model”
  4. 2To determine the number and ordering of matrices representing the word in the current syntactic context , our plf implementation relies on the syntactic type assigned to the word in the categorial grammar parse of the sentence.
    Page 3, “The practical lexical function model”
  5. Table 4: The verb to eat associated to different sets of matrices in different syntactic contexts .
    Page 5, “The practical lexical function model”

See all papers in Proc. ACL 2014 that mention syntactic contexts.

See all papers in Proc. ACL that mention syntactic contexts.

Back to top.

data sparseness

Appears in 4 sentences as: data sparseness (4)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. Estimating tensors of this size runs into data sparseness issues already for less common transitive verbs.
    Page 2, “Compositional distributional semantics”
  2. Besides losing the comparability of the semantic contribution of a word across syntactic contexts, we also worsen the data sparseness issues.
    Page 2, “Compositional distributional semantics”
  3. We expect a reasonably large corpus to feature many occurrences of a verb with a variety of subjects and a variety of objects (but not necessarily a variety of subjects with each of the objects as required by Grefenstette et al.’s training), allowing us to avoid the data sparseness issue.
    Page 5, “The practical lexical function model”
  4. Evidently, the separately-trained subject and object matrices of plf, being less affected by data sparseness than the 3-way tensors of If, are better able to capture how verbs interact with their arguments.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2014 that mention data sparseness.

See all papers in Proc. ACL that mention data sparseness.

Back to top.

vector representations

Appears in 4 sentences as: vector representation (1) vector representations (3)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. If distributional vectors encode certain aspects of word meaning, it is natural to expect that similar aspects of sentence meaning can also receive vector representations , obtained compositionally from word vectors.
    Page 1, “Compositional distributional semantics”
  2. Deverbal nouns like demolition, often used without mention of who demolished what, would have to get vector representations while the corresponding verbs (demolish) would become tensors, which makes immediately related verbs and nouns incomparable.
    Page 2, “Compositional distributional semantics”
  3. The matrices formalize argument slot saturation, operating on an argument vector representation through matrix by vector multiplication, as described in the next section.
    Page 3, “The practical lexical function model”
  4. This flexibility makes our model suitable to compute vector representations of sentences without stumbling at unseen syntactic usages of words.
    Page 5, “The practical lexical function model”

See all papers in Proc. ACL 2014 that mention vector representations.

See all papers in Proc. ACL that mention vector representations.

Back to top.

natural language

Appears in 3 sentences as: natural language (3)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. This underlying intuition, adopted from formal semantics of natural language , motivated the creation of the lexical function model of composition (lf) (Baroni and Zamparelli, 2010; Co-ecke et al., 2010).
    Page 1, “Compositional distributional semantics”
  2. The lf model can be seen as a projection of the symbolic Montagovian approach to semantic composition in natural language onto the domain of vector spaces and linear operations on them (Baroni et al., 2013).
    Page 1, “Compositional distributional semantics”
  3. The full range of semantic types required for natural language processing, including those of adverbs and transitive verbs, has to include, however, tensors of greater rank.
    Page 2, “Compositional distributional semantics”

See all papers in Proc. ACL 2014 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

semantic representations

Appears in 3 sentences as: semantic representation (1) semantic representations (2)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. The general form of a semantic representation for a linguistic unit is an ordered tuple of a vector and n E N matrices:1
    Page 3, “The practical lexical function model”
  2. The form of semantic representations we are using is shown in Table l.2
    Page 3, “The practical lexical function model”
  3. The semantic representations we propose include a semantic vector for constituents of any semantic type, thus enabling semantic comparison for words of different parts of speech (the case of demolition vs. demolish).
    Page 5, “The practical lexical function model”

See all papers in Proc. ACL 2014 that mention semantic representations.

See all papers in Proc. ACL that mention semantic representations.

Back to top.

state of the art

Appears in 3 sentences as: state of the art (3)
In A practical and linguistically-motivated approach to compositional distributional semantics
  1. Table 5 summarizes the performance of our models on the chosen tasks, and compares it to the state of the art reported in previous work, as well as to various strong baselines.
    Page 7, “Evaluation”
  2. For anvanl, plf is just below the state of the art , which is based on disambiguating the verb vector in context (Kartsaklis and Sadrzadeh, 2013), and If outperforms the baseline, which consists in using the verb vector only as a proxy to sentence similarity.5 On anvan2, plf outperforms the best model
    Page 7, “Evaluation”
  3. 5 We report state of the art from Kartsaklis and Sadrzadeh
    Page 7, “Evaluation”

See all papers in Proc. ACL 2014 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.