Predicting the relevance of distributional semantic similarity with contextual information
Muller, Philippe and Fabre, Cécile and Adam, Clémentine

Article Structure

Abstract

Using distributional analysis methods to compute semantic proximity links between words has become commonplace in NLP.

Introduction

The goal of the work presented in this paper is to improve distributional thesauri, and to help evaluate the content of such resources.

Evaluation of lexical similarity in context

2.1 Data

Experiments: predicting relevance in context

The outcome of the contextual annotation presented above is a rather sizeable dataset of validated semantic links, and we showed these linguistic judgments to be reliable.

Related work

Our work is related to two issues: evaluating distributional resources, and improving them.

Conclusion

We proposed a method to reliably evaluate distributional semantic similarity in a broad sense by considering the validation of lexical pairs in contexts where they both appear.

Topics

semantic relations

Appears in 9 sentences as: semantic relatedness (2) semantic relation (1) semantic relations (5) semantic relations” (1)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. We first set up a human annotation of semantic links with or without contextual information to show the importance of the textual context in evaluating the relevance of semantic similarity, and to assess the prevalence of actual semantic relations between word tokens.
    Page 1, “Abstract”
  2. They are not suitable for the evaluation of the whole range of semantic relatedness that is exhibited by distributional similarities, which exceeds the limits of classical lexical relations, even though researchers have tried to collect equivalent resources manually, to be used as a gold standard (Weeds, 2003; Bordag, 2008; Anguiano et al., 2011).
    Page 1, “Introduction”
  3. One advantage of distributional similarities is to exhibit a lot of different semantic relations , not necessarily standard lexical relations.
    Page 1, “Introduction”
  4. spective, to cover what (Morris and Hirst, 2004) call “non classical lexical semantic relations” .
    Page 2, “Introduction”
  5. We hypothetize that evaluating and filtering semantic relations in texts where lexical items occur would help tasks that naturally make use of semantic similarity relations, but assessing this goes beyond the present work.
    Page 2, “Introduction”
  6. We present the experiments we set up to automatically filter semantic relations in context, with various groups of features that take into account information from the corpus used to build the thesaurus and contextual information related to occurrences of semantic neighbours 3).
    Page 2, “Introduction”
  7. In other words, is there a semantic relation between them, either classical (synonymy, hypernymy, co-hyponymy, meronymy, co-meronymy) or not (the relation can be paraphrased but does not belong to the previous cases) ?”
    Page 3, “Evaluation of lexical similarity in context”
  8. We differ from all these evaluation procedures as we do not focus on an essential view of the relatedness of two lexical items, but evaluate the link in a context Where the relevance of the link is in question, an “existential” view of semantic relatedness .
    Page 8, “Related work”
  9. This helps cover non classical semantic relations which are hard to evaluate with classical resources.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2014 that mention semantic relations.

See all papers in Proc. ACL that mention semantic relations.

Back to top.

F-score

Appears in 8 sentences as: F-score (10)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. In case one wants to optimize the F-score (the harmonic mean of precision and recall) when extracting relevant pairs, we can see that the optimal point is at .24 for a threshold of .22 on Lin’s score.
    Page 4, “Evaluation of lexical similarity in context”
  2. Other popular methods (maximum entropy, SVM) have shown slightly inferior combined F-score , even though precision and recall might yield more important variations.
    Page 6, “Experiments: predicting relevance in context”
  3. As a baseline, we can also consider a simple threshold on the lexical similarity score, in our case Lin’s measure, which we have shown to yield the best F-score of 24% when set at 0.22.
    Page 6, “Experiments: predicting relevance in context”
  4. If we take the best simple classifier (random forests), the precision and recall are 68.1% and 24.2% for an F-score of 35.7%, and this is significantly beaten by the Naive Bayes method as precision and recall are more even ( F-score of 41.5%).
    Page 7, “Experiments: predicting relevance in context”
  5. Also note that predicting every link as relevant would result in a 2.6% precision, and thus a 5% F-score .
    Page 7, “Experiments: predicting relevance in context”
  6. overall best F-score of 46.3% is reached with Random Forests and the cost-aware learning method.
    Page 7, “Experiments: predicting relevance in context”
  7. Table 3 sums up the scores for the different configurations, with precision, recall, F-score and the confidence interval on the F-score .
    Page 7, “Experiments: predicting relevance in context”
  8. Recall F-score 40.4 54.3 46.3 37.4 52.8 43.8 36.1 49.5 41.8 36.5 54.8 43.8
    Page 8, “Related work”

See all papers in Proc. ACL 2014 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

precision and recall

Appears in 7 sentences as: Precision and recall (1) precision and recall (7)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. Figure 3 shows the influence of the threshold value to select relevant pairs, when considering precision and recall of the pairs that are kept when choosing the threshold, evaluated against the human annotation of relevance in context.
    Page 4, “Evaluation of lexical similarity in context”
  2. In case one wants to optimize the F-score (the harmonic mean of precision and recall ) when extracting relevant pairs, we can see that the optimal point is at .24 for a threshold of .22 on Lin’s score.
    Page 4, “Evaluation of lexical similarity in context”
  3. Figure 3: Precision and recall on relevant links with respect to a threshold on the similarity measure (Lin’s score)
    Page 4, “Experiments: predicting relevance in context”
  4. We have seen that the relevant/not relevant classification is very imbalanced, biased towards the “not relevant” category (about 11%/89%), so we applied methods dedicated to counterbalance this, and will focus on the precision and recall of the predicted relevant links.
    Page 6, “Experiments: predicting relevance in context”
  5. Other popular methods (maximum entropy, SVM) have shown slightly inferior combined F-score, even though precision and recall might yield more important variations.
    Page 6, “Experiments: predicting relevance in context”
  6. We are interested in the precision and recall for the “relevant” class.
    Page 7, “Experiments: predicting relevance in context”
  7. If we take the best simple classifier (random forests), the precision and recall are 68.1% and 24.2% for an F-score of 35.7%, and this is significantly beaten by the Naive Bayes method as precision and recall are more even (F-score of 41.5%).
    Page 7, “Experiments: predicting relevance in context”

See all papers in Proc. ACL 2014 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

similarity measure

Appears in 4 sentences as: similarity measure (4)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. A distributional thesaurus is a lexical network that lists semantic neighbours, computed from a corpus and a similarity measure between lexical items, which generally captures the similarity of contexts in which the items occur.
    Page 1, “Introduction”
  2. Figure 3: Precision and recall on relevant links with respect to a threshold on the similarity measure (Lin’s score)
    Page 4, “Experiments: predicting relevance in context”
  3. A straightforward parameter to include to predict the relevance of a link is of course the similarity measure itself, here Lin’s information measure.
    Page 5, “Experiments: predicting relevance in context”
  4. This is already a big improvement on the use of the similarity measure alone (24%).
    Page 7, “Experiments: predicting relevance in context”

See all papers in Proc. ACL 2014 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

contextual information

Appears in 3 sentences as: contextual information (3)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. We first set up a human annotation of semantic links with or without contextual information to show the importance of the textual context in evaluating the relevance of semantic similarity, and to assess the prevalence of actual semantic relations between word tokens.
    Page 1, “Abstract”
  2. We present the experiments we set up to automatically filter semantic relations in context, with various groups of features that take into account information from the corpus used to build the thesaurus and contextual information related to occurrences of semantic neighbours 3).
    Page 2, “Introduction”
  3. To verify that this methodology is useful, we did a preliminary annotation to contrast judgment on lexical pairs with or without this contextual information .
    Page 2, “Evaluation of lexical similarity in context”

See all papers in Proc. ACL 2014 that mention contextual information.

See all papers in Proc. ACL that mention contextual information.

Back to top.

distributional similarities

Appears in 3 sentences as: distributional similarities (2) distributional similarity (1)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. They are not suitable for the evaluation of the whole range of semantic relatedness that is exhibited by distributional similarities , which exceeds the limits of classical lexical relations, even though researchers have tried to collect equivalent resources manually, to be used as a gold standard (Weeds, 2003; Bordag, 2008; Anguiano et al., 2011).
    Page 1, “Introduction”
  2. One advantage of distributional similarities is to exhibit a lot of different semantic relations, not necessarily standard lexical relations.
    Page 1, “Introduction”
  3. For each pair neighboura/neighbourb, we computed a set of features from Wikipedia (the corpus used to derive the distributional similarity ): We first computed the frequencies of each item in the corpus, f reqa and f reqb, from which we derive
    Page 4, “Experiments: predicting relevance in context”

See all papers in Proc. ACL 2014 that mention distributional similarities.

See all papers in Proc. ACL that mention distributional similarities.

Back to top.

semantic similarity

Appears in 3 sentences as: semantic similarity (3)
In Predicting the relevance of distributional semantic similarity with contextual information
  1. We first set up a human annotation of semantic links with or without contextual information to show the importance of the textual context in evaluating the relevance of semantic similarity , and to assess the prevalence of actual semantic relations between word tokens.
    Page 1, “Abstract”
  2. We hypothetize that evaluating and filtering semantic relations in texts where lexical items occur would help tasks that naturally make use of semantic similarity relations, but assessing this goes beyond the present work.
    Page 2, “Introduction”
  3. We proposed a method to reliably evaluate distributional semantic similarity in a broad sense by considering the validation of lexical pairs in contexts where they both appear.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2014 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.