Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Thater, Stefan and Fürstenau, Hagen and Pinkal, Manfred

Article Structure

Abstract

We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion.

Introduction

In the logical paradigm of natural-language semantics originating from Montague (1973), semantic structure, composition and entailment have been modelled to an impressive degree of detail and formal consistency.

Related Work

Several approaches to contextualize vector representations of word meaning have been proposed.

The model

In this section, we present our method of contextualizing semantic vector representations.

Experiments: Ranking Paraphrases

In this section, we evaluate our model on a paraphrase ranking task.

Experiment: Ranking Word Senses

In this section, we apply our model to a different word sense ranking task: Given a word w in context, the task is to decide to what extent the different

Conclusion

We have presented a novel method for adapting the vector representations of words according to their context.

Topics

vector representations

Appears in 11 sentences as: vector representations (11)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. Second, the vectors of two syntactically related words, e. g., a target verb acquire and its direct object knowledge, typically have different syntactic environments, which implies that their vector representations encode complementary information and there is no direct way of combining the information encoded in the respective vectors.
    Page 2, “Introduction”
  2. To solve these problems, we build upon previous work (Thater et al., 2009) and propose to use syntactic second-order vector representations .
    Page 2, “Introduction”
  3. Second-order vector representations in a bag-of-words setting were first used by Schutze (1998); in a syntactic setting, they also feature in Dligach and Palmer (2008).
    Page 2, “Introduction”
  4. Several approaches to contextualize vector representations of word meaning have been proposed.
    Page 2, “Related Work”
  5. By using vector representations of a predicate p and an argument a, Kintsch identifies words
    Page 2, “Related Work”
  6. Mitchell and Lapata (2008), henceforth M&L, propose a general framework in which meaning representations for complex expressions are computed compositionally by combining the vector representations of the individual words of the complex expression.
    Page 2, “Related Work”
  7. Instead of using collections of different vectors, we incorporated syntactic information by assuming a richer internal structure of the vector representations .
    Page 2, “Related Work”
  8. In this section, we present our method of contextualizing semantic vector representations .
    Page 3, “The model”
  9. Our model employs vector representations for words and expressions containing syntax-specific first and second order co-occurrences information.
    Page 3, “The model”
  10. The basis for the construction of both kinds of vector representations are co-occurrence graphs.
    Page 3, “The model”
  11. We have presented a novel method for adapting the vector representations of words according to their context.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention vector representations.

See all papers in Proc. ACL that mention vector representations.

Back to top.

gold-standard

Appears in 9 sentences as: Gold-standard (1) gold-standard (8)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. We follow E&P and evaluate it only on the second subtask: we extract paraphrase candidates from the gold standard by pooling all annotated gold-standard paraphrases for all instances of a verb in all contexts, and use our model to rank these paraphrase candidates in specific contexts.
    Page 5, “Experiments: Ranking Paraphrases”
  2. P10 measures the percentage of gold-standard paraphrases in the top-ten list of paraphrases as ranked by the system, and can be defined as follows (McCarthy and Navigli, 2007):
    Page 5, “Experiments: Ranking Paraphrases”
  3. where M is the list of 10 paraphrase candidates top-ranked by the model, G is the corresponding annotated gold-standard data, and f (s) is the weight of the individual paraphrases.
    Page 5, “Experiments: Ranking Paraphrases”
  4. The dataset is identical to the one used by TDP and has been constructed in the same way as the dataset used by E&P: it contains those gold-standard instances of verbs that have—according to the analyses produced by the MiniPar parser (Lin, l993)—an overtly realized subject and object.
    Page 6, “Experiments: Ranking Paraphrases”
  5. Gold-standard paraphrases that do not occur in the parsed British National Corpus are removed.2 In total, the dataset contains 162 instances for 34 different verbs.
    Page 6, “Experiments: Ranking Paraphrases”
  6. As for the LST/SO dataset, we ignore all gold-standard paraphrases that do not occur in the parsed (Gigaword) corpus.
    Page 7, “Experiments: Ranking Paraphrases”
  7. As before, we ignore gold-standard paraphrases that do not occur in the parsed Gigaword corpus.
    Page 7, “Experiments: Ranking Paraphrases”
  8. To compare the predicted ranking to the gold-standard ranking, we use Spearman’s p, a standard method to compare ranked lists to each other.
    Page 8, “Experiment: Ranking Word Senses”
  9. The first column shows the correlation of our model’s predictions with the human judgments from the gold-standard , averaged over all instances.
    Page 8, “Experiment: Ranking Word Senses”

See all papers in Proc. ACL 2010 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

gold standard

Appears in 8 sentences as: gold standard (10)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. Systems that participated in the task had to generate paraphrases for every instance, and were evaluated against a gold standard containing up to 10 possible paraphrases for each of the individual instances.
    Page 5, “Experiments: Ranking Paraphrases”
  2. We follow E&P and evaluate it only on the second subtask: we extract paraphrase candidates from the gold standard by pooling all annotated gold-standard paraphrases for all instances of a verb in all contexts, and use our model to rank these paraphrase candidates in specific contexts.
    Page 5, “Experiments: Ranking Paraphrases”
  3. Table 1 shows two instances of the target verb acquire together with its paraphrases in the gold standard as an example.
    Page 5, “Experiments: Ranking Paraphrases”
  4. their gold standard weight.
    Page 5, “Experiments: Ranking Paraphrases”
  5. AP = R p, — i where x,- is a binary variable indicating whether the ith item as ranked by the model is in the gold standard or not, R is the size of the gold standard , and n is the number of paraphrase candidates to be ranked.
    Page 5, “Experiments: Ranking Paraphrases”
  6. If we take x,- to be the gold standard weight of the ith item or zero if it is not in the gold standard , we can define generalized average precision as follows:
    Page 5, “Experiments: Ranking Paraphrases”
  7. , yi of gold standard paraphrases.
    Page 5, “Experiments: Ranking Paraphrases”
  8. Also, our system is the first unsupervised method that has been applied to Erk and McCarthy’s (2009) graded word sense assignment task, showing a substantial positive correlation with the gold standard .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

co-occurrence

Appears in 7 sentences as: Co-occurrence (1) co-occurrence (6)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. In the standard approach, word meaning is represented by feature vectors, with large sets of context words as dimensions, and their co-occurrence frequencies as values.
    Page 1, “Introduction”
  2. This allows us to model the semantic interaction between the meaning of a head word and its dependent at the microlevel of relation-specific co-occurrence frequencies.
    Page 1, “Introduction”
  3. Figure l: Co-occurrence graph of a small sample corpus of dependency trees.
    Page 3, “Related Work”
  4. The basis for the construction of both kinds of vector representations are co-occurrence graphs.
    Page 3, “The model”
  5. Figure 1 shows the co-occurrence graph of a small sample corpus of dependency trees: Words are represented as nodes in the graph, possible dependency relations between them are drawn as labeled edges, with weights corresponding to the observed frequencies.
    Page 3, “The model”
  6. introduce another kind of vectors capturing infor-mations about all words that can be reached with two steps in the co-occurrence graph.
    Page 3, “The model”
  7. As for the full model, we use pmi values rather than raw frequency counts as co-occurrence statistics.
    Page 6, “Experiments: Ranking Paraphrases”

See all papers in Proc. ACL 2010 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

vector space

Appears in 6 sentences as: vector space (7)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. We go one step further, however, in that we employ syntactically enriched vector models as the basic meaning representations, assuming a vector space spanned by combinations of dependency relations and words (Lin, 1998).
    Page 1, “Introduction”
  2. For the problem at hand, the use of second-order vectors alleviates the sparseness problem, and enables the definition of vector space transformations that make the distributional information attached to words in different syntactic positions compatible.
    Page 2, “Introduction”
  3. Assuming a set W of words and a set R of dependency relation labels, we consider a Euclidean vector space V1 spanned by the set of orthonormal basis vectors {Enw/ | r E R,w’ E W}, i.e., a vector space whose dimensions correspond to pairs of a relation and a word.
    Page 4, “The model”
  4. In this vector space we define the first-order vector [w] of a word w as follows:
    Page 4, “The model”
  5. We further consider a similarly defined vector space V2, spanned by an orthonormal basis {Em/7w, | r, r’ E R,w’ E W}.
    Page 4, “The model”
  6. To compute the vector space , we consider only a subset of the complete set of dependency triples extracted from the parsed Gigaword corpus.
    Page 6, “Experiments: Ranking Paraphrases”

See all papers in Proc. ACL 2010 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.

word sense

Appears in 6 sentences as: word sense (4) word senses (1) “word sense (1)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. In a second experiment, we apply our model to the “word sense similarity task” recently proposed by Erk and McCarthy (2009), which is a refined variant of a word-sense disambiguation task.
    Page 2, “Introduction”
  2. The objective is to incorporate (inverse) selectional preference information from the context (r, w’ ) in such a way as to identify the correct word sense of w. This suggests that the dimensions of should be filtered so that only those compatible with the context remain.
    Page 4, “The model”
  3. In this section, we apply our model to a different word sense ranking task: Given a word w in context, the task is to decide to what extent the different
    Page 7, “Experiment: Ranking Word Senses”
  4. (2008), we represent different word senses by the words in the corresponding synsets.
    Page 8, “Experiment: Ranking Word Senses”
  5. For each word sense , we compute the centroid of the second-order vectors of its synset members.
    Page 8, “Experiment: Ranking Word Senses”
  6. Also, our system is the first unsupervised method that has been applied to Erk and McCarthy’s (2009) graded word sense assignment task, showing a substantial positive correlation with the gold standard.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention word sense.

See all papers in Proc. ACL that mention word sense.

Back to top.

WordNet

Appears in 6 sentences as: WordNet (6)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. WordNet (Fellbaum, 1998) senses of w apply to this occurrence of w.
    Page 8, “Experiment: Ranking Word Senses”
  2. The dataset contains ordinal judgments of the applicability of WordNet senses on a 5 point scale, ranging from completely difi‘er—ent to identical for eight different lemmas in 50 different sentential contexts.
    Page 8, “Experiment: Ranking Word Senses”
  3. We apply the same method as in Section 4.3: For each instance in the dataset, we compute the second-order vector of the target verb, contextually constrain it by the first-order vectors of the verb’s arguments, and compare the resulting vector to the vectors that represent the different WordNet senses of the verb.
    Page 8, “Experiment: Ranking Word Senses”
  4. The WordNet senses are then ranked according to the cosine similarity between their sense vector and the contextually constrained target verb vector.
    Page 8, “Experiment: Ranking Word Senses”
  5. The second column shows the results of a frequency-informed baseline, which predicts the ranking based on the order of the senses in WordNet .
    Page 8, “Experiment: Ranking Word Senses”
  6. We further showed that a weakly supervised heuristic, making use of WordNet sense ranks, can be significantly improved by incorporating information from our system.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

dependency relations

Appears in 5 sentences as: dependency relation (1) dependency relations (4)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. We go one step further, however, in that we employ syntactically enriched vector models as the basic meaning representations, assuming a vector space spanned by combinations of dependency relations and words (Lin, 1998).
    Page 1, “Introduction”
  2. Figure 1 shows the co-occurrence graph of a small sample corpus of dependency trees: Words are represented as nodes in the graph, possible dependency relations between them are drawn as labeled edges, with weights corresponding to the observed frequencies.
    Page 3, “The model”
  3. Such a path is characterized by two dependency relations and two words, i.e., a quadruple (r,w’,r’,w” whose weight is the product of the weights of the two edges used in the path.
    Page 3, “The model”
  4. To avoid overly sparse vectors we generalize over the “middle word” w’ and build our second-order vectors on the dimensions corresponding to triples (r, r’ ,w” ) of two dependency relations and one word at the end of the two-step path.
    Page 3, “The model”
  5. Assuming a set W of words and a set R of dependency relation labels, we consider a Euclidean vector space V1 spanned by the set of orthonormal basis vectors {Enw/ | r E R,w’ E W}, i.e., a vector space whose dimensions correspond to pairs of a relation and a word.
    Page 4, “The model”

See all papers in Proc. ACL 2010 that mention dependency relations.

See all papers in Proc. ACL that mention dependency relations.

Back to top.

dependency trees

Appears in 5 sentences as: dependency trees (5)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. Figure l: Co-occurrence graph of a small sample corpus of dependency trees .
    Page 3, “Related Work”
  2. Figure 1 shows the co-occurrence graph of a small sample corpus of dependency trees : Words are represented as nodes in the graph, possible dependency relations between them are drawn as labeled edges, with weights corresponding to the observed frequencies.
    Page 3, “The model”
  3. In the simplest case, a) would denote the frequency in a corpus of dependency trees of w occurring together with w’ in relation r. In the experiments reported below, we use pointwise mutual information (Church and Hanks, 1990) instead as it proved superior to
    Page 4, “The model”
  4. We use a vector model based on dependency trees obtained from parsing the English Gigaword corpus (LDC2003T05).
    Page 5, “Experiments: Ranking Paraphrases”
  5. We modify the dependency trees by “folding” prepositions into the edge labels to make the relation between a head word and the head noun of a prepositional phrase explicit.
    Page 5, “Experiments: Ranking Paraphrases”

See all papers in Proc. ACL 2010 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

context information

Appears in 4 sentences as: context information (3) contextual informations (1)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. A more flexible approach than simple filtering, however, is to re-weight those dimensions with context information .
    Page 4, “The model”
  2. 3Note that the context information is the same for both words.
    Page 6, “Experiments: Ranking Paraphrases”
  3. The main difference between verbs on the one hand, and nouns, adjectives, and adverbs on the other hand, is that verbs typically come with a rich context—subject, object, and so on—while non-verbs often have either no dependents at all or only closed class dependents such as determiners which provide only limited contextual informations , if any at all.
    Page 7, “Experiments: Ranking Paraphrases”
  4. Another direction for further study will be the generalization of our model to larger syntactic contexts, including more than only the direct neighbors in the dependency graph, ultimately incorporating context information from the whole sentence in a recursive fashion.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention context information.

See all papers in Proc. ACL that mention context information.

Back to top.

meaning representation

Appears in 4 sentences as: meaning representation (2) meaning representations (2)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. We go one step further, however, in that we employ syntactically enriched vector models as the basic meaning representations , assuming a vector space spanned by combinations of dependency relations and words (Lin, 1998).
    Page 1, “Introduction”
  2. Mitchell and Lapata (2008), henceforth M&L, propose a general framework in which meaning representations for complex expressions are computed compositionally by combining the vector representations of the individual words of the complex expression.
    Page 2, “Related Work”
  3. As soon as we want to compute a meaning representation for a phrase like acquire knowledge from the verb acquire together with its direct object knowledge, we are facing the problem that verbs have different syntactic neighbors than nouns, hence their first-order vectors are not easily comparable.
    Page 3, “The model”
  4. We let the first-order vector with its selectional preference information act as a kind of weighting filter on the second-order vector, and thus refine the meaning representation of the verb.
    Page 3, “The model”

See all papers in Proc. ACL 2010 that mention meaning representation.

See all papers in Proc. ACL that mention meaning representation.

Back to top.

cosine similarity

Appears in 3 sentences as: cosine similarity (3)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. Therefore the choice of which word is contextualized does not strongly influence their cosine similarity , and contextualizing both should not add any useful information.
    Page 6, “Experiments: Ranking Paraphrases”
  2. we compute [[swapOBmeadfl and compare it to the lifted first-order vectors of all paraphrase candidates, LOBJ([hint]) and LOBJ([star]), using cosine similarity .
    Page 7, “Experiments: Ranking Paraphrases”
  3. The WordNet senses are then ranked according to the cosine similarity between their sense vector and the contextually constrained target verb vector.
    Page 8, “Experiment: Ranking Word Senses”

See all papers in Proc. ACL 2010 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

human judgments

Appears in 3 sentences as: human judges (1) human judgments (2)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. Based on agreement between human judges , Erk and McCarthy (2009) estimate an upper bound p of 0.544 for the dataset.
    Page 8, “Experiment: Ranking Word Senses”
  2. The first column shows the correlation of our model’s predictions with the human judgments from the gold-standard, averaged over all instances.
    Page 8, “Experiment: Ranking Word Senses”
  3. Table 4: Correlation of model predictions and human judgments
    Page 8, “Experiment: Ranking Word Senses”

See all papers in Proc. ACL 2010 that mention human judgments.

See all papers in Proc. ACL that mention human judgments.

Back to top.

synsets

Appears in 3 sentences as: synset (1) synsets (2)
In Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
  1. (2008), we represent different word senses by the words in the corresponding synsets .
    Page 8, “Experiment: Ranking Word Senses”
  2. For each word sense, we compute the centroid of the second-order vectors of its synset members.
    Page 8, “Experiment: Ranking Word Senses”
  3. Since synsets tend to be small (they even may contain only the target word itself), we additionally add the centroid of the sense’s hypernyms, scaled down by the factor 10 (chosen as a rough heuristic without any attempt at optimization).
    Page 8, “Experiment: Ranking Word Senses”

See all papers in Proc. ACL 2010 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.