Density Maximization in Context-Sense Metric Space for All-words WSD
Tanigaki, Koichi and Shiba, Mitsuteru and Munaka, Tatsuji and Sagisaka, Yoshinori

Article Structure

Abstract

This paper proposes a novel smoothing model with a combinatorial optimization scheme for all-words word sense disambiguation from untagged corpora.

Introduction

Word Sense Disambiguation (WSD) is a task to identify the intended sense of a word based on its context.

Smoothing Model

Let us introduce in this section the basic idea for modeling context-to-sense mapping.

Simultaneous Optimization of All-words WSD

Given the smoothing model to extrapolate the senses of other words, we now make its instances interact to obtain the optimal combination of senses for all the words.

Metric Space Implementation

So far, we have dealt with general metrics for context and for sense.

Evaluation

To confirm the effect of the proposed smoothing model and its combinatorial optimization scheme, we conducted WSD evaluations.

Discussion

This section discusses the validity of the proposed method as to i) sense-interdependent disambiguation and ii) reliability of data smoothing.

Related Work

As described in Section 1, graph-based WSD has been extensively studied, since graphs are favorable structure to deal with interactions of data on vertices.

Conclusions

We proposed a novel smoothing model with a combinatorial optimization scheme for all-words WSD from untagged corpora.

Topics

distributional similarity

Appears in 12 sentences as: Distributional similarity (2) distributional similarity (11)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. (2004) proposed a method to combine sense similarity with distributional similarity and configured predominant sense score.
    Page 1, “Introduction”
  2. Distributional similarity was used to weight the influence of context words, based on large-scale statistics.
    Page 1, “Introduction”
  3. (2009) used a k-nearest words on distributional similarity as context words.
    Page 1, “Introduction”
  4. Distributional similarity (Lin, 1998) was computed among target words, based on the statistics of the test set and the background text provided as the official dataset of the SemEval-2 English all-words task (Agirre et al., 2010).
    Page 5, “Metric Space Implementation”
  5. Those texts were parsed using RASP parser (Briscoe et al., 2006) version 3.1, to obtain grammatical relations for the distributional similarity , as well as to obtain lemmata and part-of-speech (POS) tags which are required to look up the sense inventory of WordNet.
    Page 5, “Metric Space Implementation”
  6. Based on the distributional similarity , we just used k-nearest neighbor words as the context of each target word.
    Page 5, “Metric Space Implementation”
  7. The background documents consists of 2.7M running words, which was used to compute distributional similarity .
    Page 6, “Evaluation”
  8. The context metric space was composed by k:-nearest neighbor words of distributional similarity (Lin, 1998), as is described in Section 4.
    Page 6, “Evaluation”
  9. (2004), which determines the word sense based on sense similarity and distributional similarity to the k-nearest neighbor words of a target word by distributional similarity .
    Page 6, “Evaluation”
  10. It exploits no distributional similarity .
    Page 6, “Evaluation”
  11. Our major advantages are the combinatorial optimization scheme and the smoothing model to integrate distributional similarity .
    Page 6, “Evaluation”

See all papers in Proc. ACL 2013 that mention distributional similarity.

See all papers in Proc. ACL that mention distributional similarity.

Back to top.

graph-based

Appears in 7 sentences as: graph-based (7)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. In recent years, graph-based methods have attracted considerable attentions (Mihalcea, 2005; Navigli and Lapata, 2007; Agirre and Soroa, 2009).
    Page 1, “Introduction”
  2. On the graph structure of lexical knowledge base (LKB), random-walk or other well-known graph-based techniques have been applied to find mutually related senses among target words.
    Page 1, “Introduction”
  3. Unlike earlier studies disambiguating word-by-word, the graph-based methods obtain sense-interdependent solution for target words.
    Page 1, “Introduction”
  4. They apply a LKB graph-based WSD to a target word together with the distributional context words, and showed that it yields better results on a domain dataset than just using immediate context words.
    Page 1, “Introduction”
  5. As described in Section 1, graph-based WSD has been extensively studied, since graphs are favorable structure to deal with interactions of data on vertices.
    Page 9, “Related Work”
  6. Our method can be viewed as one of graph-based methods, but it regards input-t0-class mapping as vertices, and the edges represent the relations both together in context and in sense.
    Page 9, “Related Work”
  7. Mihalcea (2005) proposed graph-based methods, whose vertices are sense label hypotheses on word sequence.
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

sense disambiguation

Appears in 6 sentences as: Sense Disambiguation (1) sense disambiguation (5)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. This paper proposes a novel smoothing model with a combinatorial optimization scheme for all-words word sense disambiguation from untagged corpora.
    Page 1, “Abstract”
  2. Word Sense Disambiguation (WSD) is a task to identify the intended sense of a word based on its context.
    Page 1, “Introduction”
  3. This means that the ranks of sense candidates for each word were frequently altered through iteration, which further means that some new information not obtained earlier was delivered one after another to sense disambiguation for each word.
    Page 8, “Discussion”
  4. From these results, we could confirm the expected sense-interdependency effect that a sense disambiguation of certain word affected to other words.
    Page 8, “Discussion”
  5. In our method, sense disambiguation of a word is guided by its nearby words’ extrapolation (smoothing).
    Page 8, “Discussion”
  6. We can compute the randomness of words that affect for sense disambiguation , by word perplexity.
    Page 8, “Discussion”

See all papers in Proc. ACL 2013 that mention sense disambiguation.

See all papers in Proc. ACL that mention sense disambiguation.

Back to top.

WordNet

Appears in 6 sentences as: WordNet (6)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. The cluster centers are located at the means of hypotheses including miscellaneous alternatives not intended, thus the estimated probability distribution is, roughly speaking, offset toward the center of WordNet , which is not what we want.
    Page 5, “Simultaneous Optimization of All-words WSD”
  2. To calculate sense similarities, we used the WordNet similarity package by Pedersen et al.
    Page 5, “Metric Space Implementation”
  3. Those texts were parsed using RASP parser (Briscoe et al., 2006) version 3.1, to obtain grammatical relations for the distributional similarity, as well as to obtain lemmata and part-of-speech (POS) tags which are required to look up the sense inventory of WordNet .
    Page 5, “Metric Space Implementation”
  4. Precisions are simply omitted because the difference to the recalls are always the number of failures on referring to WordNet by mislabeling of lemmata or POSs, which is always the same for the three methods.
    Page 6, “Evaluation”
  5. (2010) used a WordNet pre-pruning.
    Page 9, “Related Work”
  6. Disambiguation is performed by considering only those candidate synsets that belong to the top-k largest connected components of the WordNet on domain corpus.
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

word sense

Appears in 5 sentences as: Word Sense (1) word sense (3) word senses (1)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. This paper proposes a novel smoothing model with a combinatorial optimization scheme for all-words word sense disambiguation from untagged corpora.
    Page 1, “Abstract”
  2. Word Sense Disambiguation (WSD) is a task to identify the intended sense of a word based on its context.
    Page 1, “Introduction”
  3. (2004), which determines the word sense based on sense similarity and distributional similarity to the k-nearest neighbor words of a target word by distributional similarity.
    Page 6, “Evaluation”
  4. (2007), which determines the word sense by maximizing the sum of sense similarity to the k immediate neighbor words of a target word.
    Page 6, “Evaluation”
  5. Thus it was confirmed that this method is valid for finding the optimal combination of word senses with large untagged corpora.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention word sense.

See all papers in Proc. ACL that mention word sense.

Back to top.

best results

Appears in 4 sentences as: best results (4)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. We compared our best results with the participating systems of the task.
    Page 7, “Evaluation”
  2. As shown in the table, our best results outperform all of the systems and the MFS baseline.
    Page 7, “Evaluation”
  3. It maps our best results in the distribution of all the 20 unsupervised/knowledge-based participating systems.
    Page 7, “Evaluation”
  4. As is seen from the figure, our best results are located above the top group, which are outside the confidence intervals of the other participants ranked intermediate or lower.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2013 that mention best results.

See all papers in Proc. ACL that mention best results.

Back to top.

probability distribution

Appears in 3 sentences as: probability distribution (3)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. Figure 1: Proposed probability distribution model for context-to-sense mapping space.
    Page 3, “Smoothing Model”
  2. Shadow thickness and surface height represents the composite probability distribution of all the twelve kernels.
    Page 4, “Simultaneous Optimization of All-words WSD”
  3. The cluster centers are located at the means of hypotheses including miscellaneous alternatives not intended, thus the estimated probability distribution is, roughly speaking, offset toward the center of WordNet, which is not what we want.
    Page 5, “Simultaneous Optimization of All-words WSD”

See all papers in Proc. ACL 2013 that mention probability distribution.

See all papers in Proc. ACL that mention probability distribution.

Back to top.

random samples

Appears in 3 sentences as: random samples (2) random sampling (1)
In Density Maximization in Context-Sense Metric Space for All-words WSD
  1. Generally speaking, statistical reliability increases as the number of random sampling increases.
    Page 8, “Discussion”
  2. Therefore, we can conclude that if sufficiently random samples of nearby words are provided, our smoothing model is reliable, though it is trained in an unsupervised fashion.
    Page 9, “Discussion”
  3. Moreover, our smoothing model, though unsupervised, provides reliable supervision when sufficiently random samples of words are available as nearby words.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention random samples.

See all papers in Proc. ACL that mention random samples.

Back to top.