Models of Semantic Representation with Visual Attributes
Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella

Article Structure

Abstract

We consider the problem of grounding the meaning of words in the physical world and focus on the Visual modality which we represent by visual attributes.

Introduction

Recent years have seen increased interest in grounded language acquisition, where the goal is to extract representations of the meaning of natural language tied to the physical world.

Related Work

Grounding semantic representations with visual information is an instance of multimodal leam-ing.

The Attribute Dataset

Concepts and Images We created a dataset of images and their visual attributes for the nouns contained in McRae et al.’s (2005) feature norms.

Attribute-based Classification

Following previous work (Farhadi et a1., 2009; Lampert et a1., 2009) we learned one classifier per attribute (i.e., 350 classifiers in total).3 The training set consisted of 91,980 images (with a maximum of 350 images per concept).

Attribute-based Semantic Models

We evaluated the effectiveness of our attribute classifiers by integrating their predictions with traditional text-only models of semantic representation.

Experimental Setup

Evaluation Task We evaluated the distributional models presented in Section 5 on the word association norms collected by Nelson et al.

Results

Our experiments were designed to answer four questions: (1) Do visual attributes improve the performance of distributional models?

Conclusions

In this paper we proposed the use of automatically computed visual attributes as a way of physically grounding word meaning.

Topics

development set

Appears in 7 sentences as: development set (7)
In Models of Semantic Representation with Visual Attributes
  1. For most concepts the development set contained a maximum of 100 images and the test set a maximum of 200 images.
    Page 4, “The Attribute Dataset”
  2. Concepts with less than 800 images in total were split into 1/8 test and development set each, and 3/4 training set.
    Page 4, “The Attribute Dataset”
  3. The development set was used for devising and refining our attribute annotation scheme.
    Page 4, “The Attribute Dataset”
  4. For each concept (e. g., bear or eggplant), we inspected the images in the development set and chose all McRae et al.
    Page 4, “The Attribute Dataset”
  5. We optimized this value on the development set and obtained best results with 5 = 0.
    Page 7, “Experimental Setup”
  6. We optimized the model parameters on a development set consisting of cue-associate pairs from Nelson et al.
    Page 9, “Results”
  7. The best performing model on the development set used 500 visual terms and 750 topics and the association measure proposed in Griffiths et al.
    Page 9, “Results”

See all papers in Proc. ACL 2013 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

semantic representation

Appears in 5 sentences as: semantic representation (3) semantic representations (2)
In Models of Semantic Representation with Visual Attributes
  1. Visual input represents a major source of data from which humans can learn semantic representations of linguistic and nonlinguistic communicative actions (Regier, 1996).
    Page 2, “Introduction”
  2. Grounding semantic representations with visual information is an instance of multimodal leam-ing.
    Page 2, “Related Work”
  3. On average, each concept was annotated with 19 attributes; approximately 14.5 of these were not part of the semantic representation created by McRae et al.’s (2005) participants for that concept even though they figured in the representations of other concepts.
    Page 5, “The Attribute Dataset”
  4. We evaluated the effectiveness of our attribute classifiers by integrating their predictions with traditional text-only models of semantic representation .
    Page 6, “Attribute-based Semantic Models”
  5. (2004)) to learn a joint semantic representation from the textual and visual modalities.
    Page 6, “Attribute-based Semantic Models”

See all papers in Proc. ACL 2013 that mention semantic representation.

See all papers in Proc. ACL that mention semantic representation.

Back to top.

statistically significant

Appears in 5 sentences as: statistically significant (5)
In Models of Semantic Representation with Visual Attributes
  1. All correlation coefficients are statistically significant (p < 0.01, N = 435).
    Page 8, “Experimental Setup”
  2. Differences in correlation coefficients between models with two versus one modality are all statistically significant (p < 0.01 using a t-test), with the exception of Concat when compared against VisAttr.
    Page 8, “Results”
  3. All correlation coefficients are statistically significant (p < 0.01, N = 1,716).
    Page 8, “Results”
  4. All correlation coefficients are statistically significant (p < 0.01, N = 435).
    Page 9, “Results”
  5. All correlation coefficients are statistically significant (p < 0.05).
    Page 9, “Results”

See all papers in Proc. ACL 2013 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

LDA

Appears in 4 sentences as: LDA (4)
In Models of Semantic Representation with Visual Attributes
  1. Their model is essentially Latent Dirichlet Allocation ( LDA , Blei et al., 2003) trained on a corpus of multimodal documents (i.e., BBC news articles and their associated images).
    Page 2, “Related Work”
  2. (2009) present an extension of LDA (Blei et al., 2003) where words in documents and their associated attributes are treated as observed variables that are explained by a generative process.
    Page 7, “Attribute-based Semantic Models”
  3. Inducing these attribute-topic components from Q) with the extended LDA model gives two sets of parameters: word probabilities given components PW (wi|X = xc) for w, i = l, ...,n, and attribute probabilities given components PA(ak|X = xc) for ak, k = 1,...,F. For example, most of the probability mass of a component x would be reserved for the words shirt, coat, dress and the attributes has_1_piece, has_seams, made_of_materia| and so on.
    Page 7, “Attribute-based Semantic Models”
  4. Unseen are concepts covered by LDA but unknown to the attribute classifiers (N = 388).
    Page 9, “Results”

See all papers in Proc. ACL 2013 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

natural language

Appears in 3 sentences as: natural language (3)
In Models of Semantic Representation with Visual Attributes
  1. Recent years have seen increased interest in grounded language acquisition, where the goal is to extract representations of the meaning of natural language tied to the physical world.
    Page 1, “Introduction”
  2. The language grounding problem has assumed seVeral guises in the literature such as semantic parsing (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; Kate and Mooney, 2007; Lu et al., 2008; Bo'rschinger et al., 2011), mapping natural language instructions to executable actions (Branavan et al., 2009; Tellex et al., 2011), associating simplified language to perceptual data such as images or Video (Siskind, 2001; Roy and Pent-land, 2002; Gorniak and Roy, 2004; Yu and Ballard, 2007), and learning the meaning of words based on linguistic and perceptual input (Bruni et al., 2012b; Feng and Lapata, 2010; Johns and Jones, 2012; Andrews et al., 2009; Silberer and Lapata, 2012).
    Page 1, “Introduction”
  3. Since our goal is to develop distributional models that are applicable to many words, it contains a considerably larger number of concepts (i.e., more than 500) and attributes (i.e., 412) based on a detailed taxonomy which we argue is cognitively plausible and beneficial for image and natural language processing tasks.
    Page 3, “Related Work”

See all papers in Proc. ACL 2013 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

probability distribution

Appears in 3 sentences as: probability distribution (3)
In Models of Semantic Representation with Visual Attributes
  1. For each image iw E Iw of concept w, we output an F -dimensional vector containing prediction scores scorea(iw) for attributes a = 1, ...,F. We transform these attribute vectors into a single vector pw 6 [0,1]1XF, by computing the centroid of all vectors for concept w. The vector is normalized to obtain a probability distribution over attributes given w:
    Page 5, “Attribute-based Classification”
  2. Let P E [0, 1]N XF denote a visual matrix, representing a probability distribution over visual attributes for each word.
    Page 6, “Attribute-based Semantic Models”
  3. We can thus compute the probability distribution over associates for each cue.
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2013 that mention probability distribution.

See all papers in Proc. ACL that mention probability distribution.

Back to top.

SVM

Appears in 3 sentences as: SVM (3)
In Models of Semantic Representation with Visual Attributes
  1. We used an L2-regularized L2-loss linear SVM (Fan et a1., 2008) to learn the attribute predictions.
    Page 5, “Attribute-based Classification”
  2. data was randomly split into a training and validation set of equal size in order to find the optimal cost parameter C. The final SVM for the attribute was trained on the entire training data, i.e., on all positive and negative examples.
    Page 5, “Attribute-based Classification”
  3. The SVM learners used the four different feature types proposed in Farhadi et al.
    Page 5, “Attribute-based Classification”

See all papers in Proc. ACL 2013 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

synsets

Appears in 3 sentences as: synset (1) synsets (2)
In Models of Semantic Representation with Visual Attributes
  1. Distributional models that integrate the visual modality have been learned from texts and images (Feng and Lapata, 2010; Bruni et al., 2012b) or from ImageNet (Deng et al., 2009), e.g., by exploiting the fact that images in this database are hierarchically organized according to WordNet synsets (Leong and Mihalcea, 2011).
    Page 2, “Introduction”
  2. ImageNet has more than 14 million images spanning 21K WordNet synsets .
    Page 3, “The Attribute Dataset”
  3. 1Some words had to be modified in order to match the correct synset , e. g., tank_(c0ntainer) was found as storage_tank.
    Page 3, “The Attribute Dataset”

See all papers in Proc. ACL 2013 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.

WordNet

Appears in 3 sentences as: WordNet (3)
In Models of Semantic Representation with Visual Attributes
  1. Distributional models that integrate the visual modality have been learned from texts and images (Feng and Lapata, 2010; Bruni et al., 2012b) or from ImageNet (Deng et al., 2009), e.g., by exploiting the fact that images in this database are hierarchically organized according to WordNet synsets (Leong and Mihalcea, 2011).
    Page 2, “Introduction”
  2. Images for the concepts in McRae et al.’s (2005) production norms were harvested from ImageNet (Deng et al., 2009), an ontology of images based on the nominal hierarchy of WordNet (Fellbaum, 1998).
    Page 3, “The Attribute Dataset”
  3. ImageNet has more than 14 million images spanning 21K WordNet synsets.
    Page 3, “The Attribute Dataset”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.