Distributional Semantics in Technicolor
Bruni, Elia and Boleda, Gemma and Baroni, Marco and Tran, Nam Khanh

Article Structure

Abstract

Our research aims at building computational models of word meaning that are perceptually grounded.

Introduction

Traditional semantic space models represent meaning on the basis of word co-occurrence statistics in large text corpora (Tumey and Pantel, 2010).

Distributional semantic models

2.1 Textual models

Textual and visual models as general semantic models

We test the models just presented in two different ways: First, as general models of word meaning, testing their correlation to human judgements on word similarity and relatedness (this section).

Experiment 1: Discovering the color of concrete objects

In Experiment 1, we test the hypothesis that the relation between words denoting concrete things and words denoting their typical color is reflected by the distance of the corresponding vectors better when the models are sensitive to visual information.

Experiment 2

Experiment 2 requires more sophisticated information than Experiment 1, as it involves distinguishing between literal and nonliteral uses of color terms.

Related work

There is an increasing amount of work in computer vision that exploits text-derived information for image retrieval and annotation tasks (Farhadi et al., 2010; Kulkarni et al., 2011).

Conclusion

We have presented evidence that distributional semantic models based on text, while providing a good general semantic representation of word meaning, can be outperformed by models using visual information for semantic aspects of words where vision is relevant.

Topics

co-occurrence

Appears in 11 sentences as: co-occurrence (13)
In Distributional Semantics in Technicolor
  1. Traditional semantic space models represent meaning on the basis of word co-occurrence statistics in large text corpora (Tumey and Pantel, 2010).
    Page 1, “Introduction”
  2. We also show that “hybrid” models exploiting the patterns of co-occurrence of words as tags of the same images can be a powerful surrogate of visual information under certain circumstances.
    Page 1, “Introduction”
  3. In all cases, occurrence and co-occurrence statistics are extracted from the freely available ukWaC and Wackypedia corpora combined (size: 1.9B and 820M tokens, respectively).1 Moreover, in all models the raw co-occurrence counts are transformed into nonnegative Local Mutual Information (LMI) scores.2 Finally, in all models we harvest vector representations for the same words (lemmas), namely the top 20K most frequent nouns, 5K most frequent adjectives and 5K most frequent verbs in the combined corpora (for coherence with the vision-based models, that cannot exploit contextual information to distinguish nouns and adjectives, we merge nominal and adjectival usages of the color adjectives in the text-based models as well).
    Page 2, “Distributional semantic models”
  4. Window2 records sentence-internal co-occurrence with the nearest 2 content words to the left and right of each target concept, a narrow context definition expected to capture taxonomic relations.
    Page 2, “Distributional semantic models”
  5. We further introduce hybrid models that exploit the patterns of co-occurrence of words as tags of the same images.
    Page 4, “Distributional semantic models”
  6. Like textual models, these models are based on word co-occurrence; like visual models, they consider co-occurrence in images (image labels).
    Page 4, “Distributional semantic models”
  7. In one model (ESP-Win, analogous to window-based models), words tagging an image were represented in terms of co-occurrence with the other tags in the image label (Baroni and Lenci (2008) are a precedent for the use of ESP-Win).
    Page 4, “Distributional semantic models”
  8. models) represented words in terms of their co-occurrence with images, using each image as a different dimension.
    Page 4, “Distributional semantic models”
  9. 11We also experimented with a model based on direct co-occurrence of adjectives and nouns, obtaining promising results in a preliminary version of Exp.
    Page 6, “Experiment 2”
  10. our results suggest that co-occurrence in an image label can be used as a surrogate of true visual information to some extent, but the behavior of hybrid models depends on ad-hoc aspects of the labeled dataset, and, from an empirical perspective, they are more limited than truly multimodal models, because they require large amounts of rich verbal picture descriptions to reach good coverage.
    Page 7, “Experiment 2”
  11. Like us, Ozbal and colleagues use both a textual model and a visual model (as well as Google adjective-noun co-occurrence counts) to find the typical color of an object.
    Page 7, “Related work”

See all papers in Proc. ACL 2012 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

feature vector

Appears in 4 sentences as: feature vector (2) feature vectors (2)
In Distributional Semantics in Technicolor
  1. From every image in a dataset, relevant areas are identified and a low-level feature vector (called a “descriptor”) is built to represent each area.
    Page 3, “Distributional semantic models”
  2. Now, given a new image, the nearest visual word is identified for each descriptor extracted from it, such that the image can be represented as a BoVW feature vector , by counting the instances of each visual word in the image (note that an occurrence of a low-level descriptor vector in an image, after mapping to the nearest cluster, will increment the count of a single dimension of the higher-level BoVW vector).
    Page 3, “Distributional semantic models”
  3. We extract descriptor features of two types.6 First, the standard Scale-Invariant Feature Transform (SIFT) feature vectors (Lowe, 1999; Lowe, 2004), good at characterizing parts of objects.
    Page 3, “Distributional semantic models”
  4. To obtain the visual word vocabulary, we cluster the SIFT feature vectors with the standardly used k-means clustering algorithm.
    Page 3, “Distributional semantic models”

See all papers in Proc. ACL 2012 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

word pairs

Appears in 4 sentences as: word pair (1) word pairs (3)
In Distributional Semantics in Technicolor
  1. The weighting parameter a (0 S a S 1) is tuned on the MEN development data (2,000 word pairs ; details on the MEN dataset in the next section).
    Page 4, “Distributional semantic models”
  2. WordSim353 (Finkelstein et al., 2002) is a widely used benchmark constructed by asking 16 subjects to rate a set of 353 word pairs on a 10-point similarity scale and averaging the ratings (dollar/buck receives a high 9.22 average rating, professor/cucumber a low 0.31).
    Page 4, “Textual and visual models as general semantic models”
  3. The version used here contained 10 judgements per word pair .
    Page 4, “Textual and visual models as general semantic models”
  4. Because of its design, word pairs in MEN can be expected to be more imageable than those in WordSim, so the visual information is more relevant for this dataset.
    Page 5, “Textual and visual models as general semantic models”

See all papers in Proc. ACL 2012 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.

semantic relatedness

Appears in 3 sentences as: semantic relatedness (3)
In Distributional Semantics in Technicolor
  1. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness ), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words.
    Page 1, “Abstract”
  2. (2) We evaluate the models on general semantic relatedness tasks and on two specific tasks where visual information is highly relevant, as they focus on color terms.
    Page 1, “Introduction”
  3. Each pair is scored on a [0,1]—normalized semantic relatedness scale via ratings obtained by crowdsourcing on the Amazon Mechanical Turk (refer to the online MEN documentation for more details).
    Page 4, “Textual and visual models as general semantic models”

See all papers in Proc. ACL 2012 that mention semantic relatedness.

See all papers in Proc. ACL that mention semantic relatedness.

Back to top.