Metaphor Detection with Cross-Lingual Model Transfer
Tsvetkov, Yulia and Boytsov, Leonid and Gershman, Anatole and Nyberg, Eric and Dyer, Chris

Article Structure

Abstract

We show that it is possible to reliably discriminate whether a syntactic construction is meant literally or metaphorically using lexical semantic features of the words that participate in the construction.

Introduction

Lakoff and Johnson (1980) characterize metaphor as reasoning about one thing in terms of another, i.e., a metaphor is a type of conceptual mapping, where words or phrases are applied to objects and actions in ways that do not permit a literal interpretation.

Methodology

Our task in this work is to define features that distinguish between metaphoric and literal uses of two syntactic constructions: subject-verb-object (SVO) and adjective-noun (AN) tuples.4 We give examples of a prototypical metaphoric usage of each type:

Model and Feature Extraction

In this section we describe a classification model, and provide details on mono- and cross-lingual implementation of features.

Datasets

In this section we describe a training and testing dataset as well a data collection procedure.

Experiments

5.1 English experiments

Related Work

For a historic overview and a survey of common approaches to metaphor detection, we refer the reader to recent reviews by Shutova et al.

Conclusion

The key contribution of our work is that we show how to identify metaphors across languages by building a model in English and applying it—without adaptation—to other languages: Spanish, Farsi, and Russian.

Topics

imageability

Appears in 18 sentences as: Imageability (1) imageability (14) imageable (4)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. We define three main feature categories (1) abstractness and imageability , (2) supersenses, (3) unsupervised vector-space word representations; each category corresponds to a group of features with a common theme and representation.
    Page 2, “Methodology”
  2. 0 Abstractness and imageability .
    Page 2, “Methodology”
  3. Abstractness and imageability were shown to be useful in detection of metaphors (it is easier to invoke mental pictures of concrete and imageable words) (Turney et al., 2011; Broadwell et al., 2013).
    Page 2, “Methodology”
  4. Although often correlated with abstractness, imageability is not a redundant property.
    Page 2, “Methodology”
  5. Abstractness and imageability .
    Page 3, “Model and Feature Extraction”
  6. The MRC psycholinguistic database is a large dictionary listing linguistic and psycholinguistic attributes obtained experimentally (Wilson, 1988).10 It includes, among other data, 4,295 words rated by the degrees of abstractness and 1,156 words rated by the imageability .
    Page 3, “Model and Feature Extraction”
  7. (2013), we use a logistic regression classifier to propagate abstractness and imageability scores from MRC ratings to all words for which we have vector space representations.
    Page 3, “Model and Feature Extraction”
  8. More specifically, we calculate the degree of abstractness and imageability of all English items that have a vector space representation, using vector elements as features.
    Page 3, “Model and Feature Extraction”
  9. We train two separate classifiers for abstractness and imageability on a seed set of words from the MRC database.
    Page 3, “Model and Feature Extraction”
  10. Degrees of abstractness and imageability are posterior probabilities of classifier predictions.
    Page 3, “Model and Feature Extraction”
  11. “Thresholds are equal to 0.8 for abstractness and to 0.9 for imageability .
    Page 3, “Model and Feature Extraction”

See all papers in Proc. ACL 2014 that mention imageability.

See all papers in Proc. ACL that mention imageability.

Back to top.

cross-lingual

Appears in 11 sentences as: Cross-lingual (4) cross-lingual (9)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. To test this hypothesis, we use a cross-lingual model transfer approach: we use bilingual dictionaries to project words from other syntactic constructions found in other languages into English and then apply the English model on the derived conceptual representations.
    Page 2, “Methodology”
  2. In addition, this coarse semantic categorization is preserved in translation (Schneider et al., 2013), which makes supersense features suitable for cross-lingual approaches such as ours.
    Page 3, “Methodology”
  3. (2013) reveal an interesting cross-lingual property of distributed word representations: there is a strong similarity between the vector spaces across languages that can be easily captured by linear mapping.
    Page 3, “Methodology”
  4. In this section we describe a classification model, and provide details on mono- and cross-lingual implementation of features.
    Page 3, “Model and Feature Extraction”
  5. 3.3 Cross-lingual feature projection
    Page 4, “Model and Feature Extraction”
  6. 5.3 Cross-lingual experiments
    Page 7, “Experiments”
  7. Figure 2: Cross-lingual experiment: ROC curves for classifiers trained on the English data using a combination of all features, and applied to SVO and AN metaphoric and literal relations in four test languages: English, Russian, Spanish, and Farsi.
    Page 8, “Experiments”
  8. Table 5: Cross-lingual experiment: f -scores for classifiers trained on the English data using a combination of all features, and applied, with optimal thresholds, to SVO and AN metaphoric and literal relations in four test languages: English, Russian, Spanish, and Farsi.
    Page 8, “Experiments”
  9. (2013) propose a cross-lingual detection method that uses only English lexical resources and a dependency parser.
    Page 9, “Related Work”
  10. Studies showed that cross-lingual evidence allows one to achieve a state-of-the-art performance in the WSD task, yet, most cross-lingual WSD methods employ parallel corpora (Navigli, 2009).
    Page 9, “Related Work”
  11. Second, cross-lingual model transfer can be improved with more careful cross-lingual feature projection.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

WordNet

Appears in 10 sentences as: WordNet (11)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. mantic categories originating in WordNet .
    Page 2, “Methodology”
  2. 7Supersenses are called “lexicographer classes” in WordNet documentation (Fellbaum, 1998), http: / /worolnet .
    Page 2, “Methodology”
  3. English adjectives do not, as yet, have a similar high-level semantic partitioning in WordNet , thus we use a 13-class taxonomy of adjective supersenses constructed by Tsvetkov et al.
    Page 3, “Methodology”
  4. WordNet lacks coarse-grained semantic categories for adjectives.
    Page 4, “Model and Feature Extraction”
  5. For example, the top-level classes in GermaNet include: adj.feeling (e.g., willing, pleasant, cheerful); adj.sabstance (e.g., dry, ripe, creamy); adj.spatial (e.g., adjacent, gigantic).12 For each adjective type in WordNet , they produce a vector with a classifier posterior probabilities corresponding to degrees of membership of this word in one of the 13 semantic classes,13 similar to the feature vectors we build for nouns and verbs.
    Page 4, “Model and Feature Extraction”
  6. Consider an example related to projection of WordNet supersenses.
    Page 4, “Model and Feature Extraction”
  7. (2013) describe a Concrete Category Overlap algorithm, where co-occurrence statistics and Turney’s abstractness scores are used to determine WordNet supersenses that correspond to literal usage of a given adjective or verb.
    Page 8, “Related Work”
  8. To implement this idea, they extend MRC imageability scores to all dictionary words using links among WordNet supersenses (mostly hypernym and hyponym relations).
    Page 9, “Related Work”
  9. Because they heavily rely on WordNet and availability of imageability scores, their approach may not be applicable to low-resource languages.
    Page 9, “Related Work”
  10. Their method also employs WordNet supersenses, but it is not clear from the description whether WordNet is essential or can be replaced with some other lexical resource.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

vector space

Appears in 8 sentences as: Vector space (3) vector space (4) vector spaces (1)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. 0 Vector space word representations.
    Page 3, “Methodology”
  2. Vector space word representations learned using unsupervised algorithms are often effective features in supervised learning methods (Turian et al., 2010).
    Page 3, “Methodology”
  3. (2013) reveal an interesting cross-lingual property of distributed word representations: there is a strong similarity between the vector spaces across languages that can be easily captured by linear mapping.
    Page 3, “Methodology”
  4. Thus, vector space models can also be seen as vectors of (latent) semantic concepts, that preserve their “meaning” across languages.
    Page 3, “Methodology”
  5. (2013), we use a logistic regression classifier to propagate abstractness and imageability scores from MRC ratings to all words for which we have vector space representations.
    Page 3, “Model and Feature Extraction”
  6. More specifically, we calculate the degree of abstractness and imageability of all English items that have a vector space representation, using vector elements as features.
    Page 3, “Model and Feature Extraction”
  7. Vector space word representations.
    Page 4, “Model and Feature Extraction”
  8. (2013) in that it uses additional features ( vector space word representations) and a different classification method (we use random forests while Tsvetkov et al.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.

word representations

Appears in 7 sentences as: word representations (7)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. We define three main feature categories (1) abstractness and imageability, (2) supersenses, (3) unsupervised vector-space word representations ; each category corresponds to a group of features with a common theme and representation.
    Page 2, “Methodology”
  2. 0 Vector space word representations .
    Page 3, “Methodology”
  3. Vector space word representations learned using unsupervised algorithms are often effective features in supervised learning methods (Turian et al., 2010).
    Page 3, “Methodology”
  4. (2013) reveal an interesting cross-lingual property of distributed word representations : there is a strong similarity between the vector spaces across languages that can be easily captured by linear mapping.
    Page 3, “Methodology”
  5. Vector space word representations .
    Page 4, “Model and Feature Extraction”
  6. We employ 64-dimensional vector-space word representations constructed by Faruqui and Dyer (2014).14 Vector construction algorithm is a variation on traditional latent semantic analysis (Deerwester et al., 1990) that uses multilingual information to produce representations in which synonymous words have similar vectors.
    Page 4, “Model and Feature Extraction”
  7. (2013) in that it uses additional features (vector space word representations ) and a different classification method (we use random forests while Tsvetkov et al.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention word representations.

See all papers in Proc. ACL that mention word representations.

Back to top.

feature vectors

Appears in 6 sentences as: feature vector (3) feature vectors (4)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. Each SVO (or AN) instance will be represented by a triple (duple) from which a feature vector will be extracted.5 The vector will consist of the concatenation of the conceptual features (which we discuss below) for all participating words, and conjunction features for word pairs.6 For example, to generate the feature vector for the SVO triple (car, drink, gasoline), we compute all the features for the individual words car, drink, gasoline and combine them with the conjunction features for the pairs car drink and drink gasoline.
    Page 2, “Methodology”
  2. 6If word one is represented by features u E R” and word two by features v E Rm then the conjunction feature vector is the vectorization of the outer product uvT.
    Page 2, “Methodology”
  3. Degrees of membership in different supersenses are represented by feature vectors , where each element corresponds to one supersense.
    Page 4, “Model and Feature Extraction”
  4. For example, the top-level classes in GermaNet include: adj.feeling (e.g., willing, pleasant, cheerful); adj.sabstance (e.g., dry, ripe, creamy); adj.spatial (e.g., adjacent, gigantic).12 For each adjective type in WordNet, they produce a vector with a classifier posterior probabilities corresponding to degrees of membership of this word in one of the 13 semantic classes,13 similar to the feature vectors we build for nouns and verbs.
    Page 4, “Model and Feature Extraction”
  5. For languages other than English, feature vectors are projected to English features using translation dictionaries.
    Page 4, “Model and Feature Extraction”
  6. Then, we average all feature vectors related to these translations.
    Page 4, “Model and Feature Extraction”

See all papers in Proc. ACL 2014 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

cross validation

Appears in 5 sentences as: cross validation (5)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. This is done by computing an accuracy in the 10-fold cross validation .
    Page 5, “Experiments”
  2. Table 2: 10-fold cross validation results for three feature categories and their combination, for classifiers trained on English SVO and AN training sets.
    Page 5, “Experiments”
  3. ACC column reports an accuracy score in the 10-fold cross validation .
    Page 5, “Experiments”
  4. For the AN task, the cross validation accuracy is better by 8% than the result of Turney et al.
    Page 5, “Experiments”
  5. Table 4: Comparing AN metaphor detection method to the baselines: accuracy of the 10-fold cross validation on annotations of five human judges.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.

synsets

Appears in 5 sentences as: synsets (5)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. A lexical item can belong to several synsets , which are associated with different supersenses.
    Page 4, “Model and Feature Extraction”
  2. For example, the word head (when used as a noun) participates in 33 synsets , three of which are related to the supersense noan.b0dy.
    Page 4, “Model and Feature Extraction”
  3. Hence, we select all the synsets of the nouns head and brain.
    Page 4, “Model and Feature Extraction”
  4. There are 38 such synsets (33 for head and 5 for brain).
    Page 4, “Model and Feature Extraction”
  5. Four of these synsets are associated with the supersense n0un.b0dy.
    Page 4, “Model and Feature Extraction”

See all papers in Proc. ACL 2014 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.

feature sets

Appears in 4 sentences as: feature set (1) feature sets (3)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. Experimental results are given in Table 2, where we also provide the number of features in each feature set .
    Page 5, “Experiments”
  2. Figure 1: ROC curves for classifiers trained using different feature sets (English SVO and AN test sets).
    Page 6, “Experiments”
  3. According to ROC plots in Figure 1, all three feature sets are effective, both for SVO and for AN tasks.
    Page 6, “Experiments”
  4. Current work builds on this study, and incorporates new syntactic relations as metaphor candidates, adds several new feature sets and different, more reliable datasets for evaluating results.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention feature sets.

See all papers in Proc. ACL that mention feature sets.

Back to top.

logistic regression

Appears in 4 sentences as: logistic regression (4)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. In addition, decision-tree classifiers learn nonlinear responses to inputs and often outperform logistic regression (Perlich et al., 2003).9 Our random forest classifier models the probability that the input syntactic relation is metaphorical.
    Page 3, “Model and Feature Extraction”
  2. (2013), we use a logistic regression classifier to propagate abstractness and imageability scores from MRC ratings to all words for which we have vector space representations.
    Page 3, “Model and Feature Extraction”
  3. 9In our experiments, random forests model slightly outperformed logistic regression and SVM classifiers.
    Page 3, “Model and Feature Extraction”
  4. (2013) use logistic regression ).
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

human judges

Appears in 3 sentences as: human judges (3)
In Metaphor Detection with Cross-Lingual Model Transfer
  1. They are collected by the same human judges and belong to the same domain.
    Page 6, “Experiments”
  2. The pairs were presented to five human judges who rated each pair on a scale from 1 (very literal/denotative) to 4 (very non-literal/connotative).
    Page 7, “Experiments”
  3. Table 4: Comparing AN metaphor detection method to the baselines: accuracy of the 10-fold cross validation on annotations of five human judges .
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention human judges.

See all papers in Proc. ACL that mention human judges.

Back to top.