Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
Ferret, Olivier

Article Structure

Abstract

Distributional thesauri are now widely used in a large number of Natural Language Processing tasks.

Introduction

The work we present in this article focuses on the automatic building of a thesaurus from a corpus.

Principles

Our work shares with (Zhitomirsky-Geffet and Dagan, 2009) the use of a kind of bootstrapping as it starts from a distributional thesaurus and to some extent, exploits it for its improvement.

Improving a distributional thesaurus

3.1 Overview

Experiments and evaluation

4.1 Initial thesaurus evaluation

Related work

The building of distributional thesaurus is generally viewed as an application or a mode of evaluation of work about semantic similarity or semantic relatedness.

Conclusion and perspectives

In this article, we have presented a new approach for reranking the semantic neighbors of a distributional thesaurus.

Topics

semantic similarity

Appears in 26 sentences as: semantic similarity (18) semantically similar (10)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures.
    Page 1, “Abstract”
  2. The distinction between these two interpretations refers to the distinction between the notions of semantic similarity and semantic relatedness as it was done in (Budanitsky and Hirst, 2006) or in (Zesch and Gurevych, 2010) for instance.
    Page 1, “Introduction”
  3. However, the limit between these two notions is sometimes hard to find in existing work as terms semantic similarity and semantic relatedness are often used interchangeably.
    Page 1, “Introduction”
  4. Moreover, semantic similarity is frequently considered as included into semantic relatedness and the two problems are often tackled by using the same methods.
    Page 1, “Introduction”
  5. In the remainder of this article, we will use the term semantic similarity with its generic sense and the term semantic relatedness for referring more specifically to similarity based on syntagmatic relations.
    Page 1, “Introduction”
  6. Following work such as (Grefenstette, 1994), a widespread way to build a thesaurus from a corpus is to use a semantic similarity measure for extracting the semantic neighbors of the entries of the thesaurus.
    Page 1, “Introduction”
  7. Work based on WordNet-like lexical networks for building semantic similarity measures such as (Budanitsky and Hirst, 2006) or (Pedersen et al., 2004) falls into this category.
    Page 1, “Introduction”
  8. The last option is the corpus-based approach, based on the distributional hypothesis (Firth, 1957): each word is characterized by the set of contexts from a corpus in which it appears and the semantic similarity of two words is computed from the contexts they share.
    Page 2, “Introduction”
  9. of its bad semantic neighbors, that is to say the neighbors of the entry that are actually not semantically similar to the entry.
    Page 2, “Principles”
  10. As a consequence, two words are considered as semantically similar if they occur in a large enough set of shared contexts.
    Page 2, “Principles”
  11. in a sentence, from all other words and more particularly, from those of its neighbors in a distributional thesaurus that are likely to be actually not semantically similar to it.
    Page 3, “Principles”

See all papers in Proc. ACL 2013 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

reranking

Appears in 17 sentences as: reranked (4) reranking (13)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. o reranking of entry’s neighbors according to bad neighbors.
    Page 3, “Improving a distributional thesaurus”
  2. As mentioned in section 3.1, the starting point of our reranking process is the definition of a model for determining to what extent a word in a sentence, which is not supposed to be known in the context of this task, corresponds or not to a reference word E. This task can also be viewed as a tagging task in which the occurrences of a target word T are labeled with two tags: E and notE.
    Page 4, “Improving a distributional thesaurus”
  3. 3.4 Identification of bad neighbors and thesaurus reranking
    Page 5, “Improving a distributional thesaurus”
  4. Figure 1 gives the results of the reranked thesaurus for these entries in terms of R-precision and MAP against reference W5 for various values of G. Although the level of these measures does not change a lot for G > 5, the graph of Figure 1 shows that G = 15 appears to be an optimal value.
    Page 7, “Experiments and evaluation”
  5. 4.3 Evaluation of the reranked thesaurus
    Page 7, “Experiments and evaluation”
  6. Table 4 gives the evaluation of the application of our reranking method to the initial thesaurus according to the same principles as in section 4.1.
    Page 7, “Experiments and evaluation”
  7. As the recall measure and the precision for the last rank do not change in a reranking process, they are not given again.
    Page 7, “Experiments and evaluation”
  8. perpleXity, respect, ruination, appreciation, neighbourliness ... gratitude, admiration, respect, appreciation, neighborliness, trust, reranking empathy, goodwill, reciprocity, half—staff, affection, self—esteem, reverence, longing, regard .
    Page 8, “Experiments and evaluation”
  9. Table 5: Impact of our reranking for the entry esteem
    Page 8, “Experiments and evaluation”
  10. Table 5 illustrates more precisely the impact of our reranking procedure for the middle frequency entry esteem.
    Page 8, “Experiments and evaluation”
  11. The reranking produces a thesaurus in which these two words appear as the second and the third neighbors of the entry because neighbors without clear relation with it such as backscratching were downgraded while its third synonym in WordNet is raised from rank 22 to rank 15.
    Page 8, “Experiments and evaluation”

See all papers in Proc. ACL 2013 that mention reranking.

See all papers in Proc. ACL that mention reranking.

Back to top.

semantic relatedness

Appears in 13 sentences as: semantic relatedness (8) semantic relations (4) semantically related (2)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. However, they are far from containing only interesting semantic relations .
    Page 1, “Abstract”
  2. The term semantic neighbor is very generic and can have two main interpretations according to the kind of semantic relations it is based on: one relies only on paradigmatic relations, such as hy-pernymy or synonymy, while the other consid-
    Page 1, “Introduction”
  3. The distinction between these two interpretations refers to the distinction between the notions of semantic similarity and semantic relatedness as it was done in (Budanitsky and Hirst, 2006) or in (Zesch and Gurevych, 2010) for instance.
    Page 1, “Introduction”
  4. However, the limit between these two notions is sometimes hard to find in existing work as terms semantic similarity and semantic relatedness are often used interchangeably.
    Page 1, “Introduction”
  5. Moreover, semantic similarity is frequently considered as included into semantic relatedness and the two problems are often tackled by using the same methods.
    Page 1, “Introduction”
  6. In the remainder of this article, we will use the term semantic similarity with its generic sense and the term semantic relatedness for referring more specifically to similarity based on syntagmatic relations.
    Page 1, “Introduction”
  7. The first one relies on handcrafted resources in which semantic relations are clearly identified.
    Page 1, “Introduction”
  8. cerns the type of semantic relations : results with Moby as reference are improved in a larger extent than results with WordNet as reference.
    Page 8, “Experiments and evaluation”
  9. This suggests that our procedure is more effective for semantically related words than for semantically similar words, which can be considered as a little bit surprising since the notion of context in our discriminative classifier seems a priori more strict than in “classical” distributional contexts.
    Page 8, “Experiments and evaluation”
  10. The building of distributional thesaurus is generally viewed as an application or a mode of evaluation of work about semantic similarity or semantic relatedness .
    Page 8, “Related work”
  11. arises from the imbalance between semantic similarity and semantic relatedness among training examples: most of selected examples were pairs of words linked by semantic relatedness because this kind of relations are more frequent among semantic neighbors than relations based on semantic similarity.
    Page 9, “Related work”

See all papers in Proc. ACL 2013 that mention semantic relatedness.

See all papers in Proc. ACL that mention semantic relatedness.

Back to top.

similarity measure

Appears in 10 sentences as: similarity measure (6) similarity measures (4)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures .
    Page 1, “Abstract”
  2. Following work such as (Grefenstette, 1994), a widespread way to build a thesaurus from a corpus is to use a semantic similarity measure for extracting the semantic neighbors of the entries of the thesaurus.
    Page 1, “Introduction”
  3. Work based on WordNet-like lexical networks for building semantic similarity measures such as (Budanitsky and Hirst, 2006) or (Pedersen et al., 2004) falls into this category.
    Page 1, “Introduction”
  4. A part of these proposals focus on the weighting of the elements that are part of the contexts of words such as (Broda et al., 2009), in which the weights of context elements are turned into ranks, or (Zhitomirsky-Geffet and Dagan, 2009), followed and extended by (Yamamoto and Asakura, 2010), that proposes a bootstrapping method for modifying the weights of context elements according to the semantic neighbors found by an initial distributional similarity measure .
    Page 2, “Introduction”
  5. For instance, features such as ngrams of words or ngrams of parts of speech are not considered whereas they are widely used in tasks such as word sense disambiguation (WSD) for instance, probably because they would lead to very large models and because similarity measures such as the Cosine measure are not necessarily suitable for heterogeneous representations (Alexandrescu and Kirchhoff, 2007).
    Page 2, “Principles”
  6. As in (Lin, 1998) or (Curran and Moens, 2002a), this building is based on the definition of a semantic similarity measure from a corpus.
    Page 3, “Improving a distributional thesaurus”
  7. For the extraction of distributional data and the characteristics of the distributional similarity measure , we adopted the options of (Ferret, 2010), resulting from a kind of grid search procedure performed with the extended TOEFL test proposed in (Freitag et al., 2005) as an optimization objective.
    Page 3, “Improving a distributional thesaurus”
  8. o similarity measure between contexts, for evaluating the semantic similarity of two words = Cosine measure.
    Page 4, “Improving a distributional thesaurus”
  9. The building of our initial thesaurus from the similarity measure above was performed classically by extracting the closest semantic neighbors of each of its entries.
    Page 4, “Improving a distributional thesaurus”
  10. As a consequence, the improvement of such thesaurus is generally not directly addressed but is a possible consequence of the improvement of semantic similarity measures .
    Page 8, “Related work”

See all papers in Proc. ACL 2013 that mention similarity measure.

See all papers in Proc. ACL that mention similarity measure.

Back to top.

WordNet

Appears in 10 sentences as: WordNet (10) WordNet’s (1)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. The second approach makes use of a less structured source of knowledge about words such as the definitions of classical dictionaries or the glosses of WordNet .
    Page 1, “Introduction”
  2. WordNet’s glosses were used to support Lesk-like measures in (Banerjee and Pedersen, 2003) and more recently, measures were also defined from Wikipedia or Wiktionaries (Gabrilovich and
    Page 1, “Introduction”
  3. Table 2 shows the results of the evaluation of our initial thesaurus, achieved by comparing the selected semantic neighbors with two complementary reference resources: WordNet 3.0 synonyms (Miller, 1990) [W], which characterize a semantic similarity based on paradigmatic relations, and the Moby thesaurus (Ward, 1996) [M], which gathers a larger set of types of relations and is more representative of semantic relatedness3.
    Page 5, “Experiments and evaluation”
  4. WordNet provides a restricted number of synonyms for each noun while the Moby thesaurus contains for each entry a large number of synonyms and similar words.
    Page 6, “Experiments and evaluation”
  5. As a consequence, the precisions at different cutoffs have a significantly higher value with Moby as reference than with WordNet as reference.
    Page 6, “Experiments and evaluation”
  6. This is confirmed by the fact that, with WordNet as reference, only 744 neighbors were found wrongly downgraded, spread over 686 entries, which represents only 5% of all downgraded neighbors.
    Page 7, “Experiments and evaluation”
  7. cerns the type of semantic relations: results with Moby as reference are improved in a larger extent than results with WordNet as reference.
    Page 8, “Experiments and evaluation”
  8. WordNet respect, admiration, regard admiration, appreciation, acceptance, dignity, regard, respect, ac—
    Page 8, “Experiments and evaluation”
  9. Its WordNet row gives all the reference synonyms for this entry in WordNet While its M_oby row gives the first reference related words
    Page 8, “Experiments and evaluation”
  10. The reranking produces a thesaurus in which these two words appear as the second and the third neighbors of the entry because neighbors without clear relation with it such as backscratching were downgraded while its third synonym in WordNet is raised from rank 22 to rank 15.
    Page 8, “Experiments and evaluation”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

SVM

Appears in 5 sentences as: SVM (5)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. More precisely, we follow (Lee and Ng, 2002), a reference work for WSD, by adopting a Support Vector Machines ( SVM ) classifier with a linear kernel and three kinds of features for characterizing each considered occur-
    Page 4, “Improving a distributional thesaurus”
  2. For the second type of features, we take more precisely the POS of the three words before E and those of the three words after E. Each pair {POS, position} corresponds to a binary feature for the SVM classifier.
    Page 4, “Improving a distributional thesaurus”
  3. Each instance of the 11 types of collocations is represented by a tuple (lemmal, positionl, lemma2, position2> and leads to a binary feature for the SVM classifier.
    Page 4, “Improving a distributional thesaurus”
  4. In accordance with the process of section 3.1, a specific SVM classifier is trained for each entry of our initial thesaurus, which requires the unsupervised selection of a set of positive and negative examples.
    Page 4, “Improving a distributional thesaurus”
  5. As mentioned before, this classifier is a linear SVM .
    Page 7, “Experiments and evaluation”

See all papers in Proc. ACL 2013 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

distributional representation

Appears in 3 sentences as: distributional representation (2) distributional representations (1)
In Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
  1. As a result, the distributional representation of a word takes the unstructured form of a bag of words or the more structured form of a set of pairs {syntactic relation, word}.
    Page 2, “Principles”
  2. A variant of this approach was proposed in (Kazama et al., 2010) where the distributional representation of a word is modeled as a multinomial distribution with Dirichlet as prior.
    Page 2, “Principles”
  3. The principles presented in the previous section face one major problem compared to the “classical” distributional approach : the semantic similarity of two words can be evaluated directly by computing the similarity of their distributional representations .
    Page 3, “Improving a distributional thesaurus”

See all papers in Proc. ACL 2013 that mention distributional representation.

See all papers in Proc. ACL that mention distributional representation.

Back to top.