That's sick dude!: Automatic identification of word sense change across different timescales
Mitra, Sunny and Mitra, Ritwik and Riedl, Martin and Biemann, Chris and Mukherjee, Animesh and Goyal, Pawan

Article Structure

Abstract

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books.

Introduction

Two of the fundamental components of a natural language communication are word sense discovery (Jones, 1986) and word sense disambiguation (Ide and Veronis, 1998).

Related work

Word sense disambiguation as well as word sense discovery have both remained key areas of research right from the very early initiatives in natural language processing research.

Datasets and graph construction

In this section, we outline a brief description of the dataset used for our experiments and the graph construction procedure.

Tracking sense changes

The basic idea of our algorithm for tracking sense changes is as follows.

Experimental framework

For our experiments, we utilized DTs created for 8 different time periods: 1520-1908, 1909-1953, 1954-1972, 1973-1986, 1987-1995, 1996-2001, 2002-2005 and 2006-2008 (Riedl et al., 2014).

Evaluation framework

During evaluation, we considered the clusters obtained using the 1909-195 3 time-slice as our reference and attempted to track sense change by comparing these with the clusters obtained for 2002-2005.

Conclusions

In this paper, we presented a completely unsupervised method to detect word sense changes by analyzing millions of digitized books archived spanning several centuries.

Topics

WordNet

Appears in 23 sentences as: WordNet (27)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet .
    Page 1, “Abstract”
  2. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet .
    Page 1, “Abstract”
  3. Remarkably, comparison with the English WordNet indicates that in 44% cases, as identified by our algorithm, there has been a birth of a completely novel sense, in 46% cases a new sense has split off from an older sense and in 43% cases two or more older senses have merged in to form a new sense.
    Page 2, “Introduction”
  4. A few approaches suggested by (Bond et al., 2009; Paakko' and Linden, 2012) attempt to augment WordNet synsets primarily using methods of annotation.
    Page 2, “Related work”
  5. 6.2 Automated evaluation with WordNet
    Page 6, “Evaluation framework”
  6. We chose WordNet for automated evaluation because not only does it have a wide coverage of word senses but also it is being maintained and updated regularly to incorporate new senses.
    Page 6, “Evaluation framework”
  7. For our evaluation, we developed an aligner to align the word clusters obtained with WordNet senses.
    Page 6, “Evaluation framework”
  8. The aligner constructs a WordNet dictionary for the purpose of synset alignment.
    Page 6, “Evaluation framework”
  9. The CW cluster is then aligned to WordNet synsets by comparing the clusters with WordNet graph and the synset with the maximum alignment score is returned as the output.
    Page 6, “Evaluation framework”
  10. In summary, the aligner tool takes as input the CW cluster and returns a WordNet synset id that corresponds to the cluster words.
    Page 6, “Evaluation framework”
  11. Birth: For a candidate word flagged as birth, we first find out the set of all WordNet synset ids for its CW clusters in the source time period (1909-1953 in this case).
    Page 6, “Evaluation framework”

See all papers in Proc. ACL 2014 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

word sense

Appears in 17 sentences as: Word sense (1) word sense (11) word senses (7)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.
    Page 1, “Abstract”
  2. Two of the fundamental components of a natural language communication are word sense discovery (Jones, 1986) and word sense disambiguation (Ide and Veronis, 1998).
    Page 1, “Introduction”
  3. Context plays a vital role in disambiguation of word senses as well as in the interpretation of the actual meaning of words.
    Page 1, “Introduction”
  4. For instance, the word “bank” has several distinct interpretations, including that of a “financial institution” and the “shore of a river.” Automatic discovery and disambiguation of word senses from a given text is an important and challenging problem which has been extensively studied in the literature (Jones, 1986; Ide and Vero-nis, 1998; Schutze, 1998; Navigli, 2009).
    Page 1, “Introduction”
  5. Note that this phenomena of change in word senses has existed ever since the beginning of human communication (Bamman and Crane, 2011; Michel et
    Page 1, “Introduction”
  6. al., 2011; Wijaya and Yeniterzi, 2011; Mihalcea and Nastase, 2012); however, with the advent of modern technology and the availability of huge volumes of time-varying data it now has become possible to automatically track such changes and, thereby, help the lexicographers in word sense discovery, and design engineers in enhancing various NLP/IR applications (e. g., disambiguation, semantic search etc.)
    Page 2, “Introduction”
  7. that are naturally sensitive to change in word senses .
    Page 2, “Introduction”
  8. In Section 4 we present an approach based on graph clustering to identify the time-varying sense clusters and in Section 5 we present the split-merge based approach for tracking word sense changes.
    Page 2, “Introduction”
  9. Word sense disambiguation as well as word sense discovery have both remained key areas of research right from the very early initiatives in natural language processing research.
    Page 2, “Related work”
  10. Ide and Vero-nis (1998) present a very concise survey of the history of ideas used in word sense disambiguation; for a recent survey of the state-of-the-art one can refer to (Navigli, 2009).
    Page 2, “Related work”
  11. to automatic word sense discovery were made by Karen Sparck Jones (1986); later in lexicography, it has been extensively used as a preprocessing step for preparing mono- and multilingual dictionaries (Kilgarriff and Tugwell, 2001; Kilgarriff, 2004).
    Page 2, “Related work”

See all papers in Proc. ACL 2014 that mention word sense.

See all papers in Proc. ACL that mention word sense.

Back to top.

synset

Appears in 10 sentences as: synset (10) synsets (2)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. A few approaches suggested by (Bond et al., 2009; Paakko' and Linden, 2012) attempt to augment WordNet synsets primarily using methods of annotation.
    Page 2, “Related work”
  2. The aligner constructs a WordNet dictionary for the purpose of synset alignment.
    Page 6, “Evaluation framework”
  3. The CW cluster is then aligned to WordNet synsets by comparing the clusters with WordNet graph and the synset with the maximum alignment score is returned as the output.
    Page 6, “Evaluation framework”
  4. In summary, the aligner tool takes as input the CW cluster and returns a WordNet synset id that corresponds to the cluster words.
    Page 6, “Evaluation framework”
  5. Birth: For a candidate word flagged as birth, we first find out the set of all WordNet synset ids for its CW clusters in the source time period (1909-1953 in this case).
    Page 6, “Evaluation framework”
  6. Let Smit denote the union of these synset ids.
    Page 6, “Evaluation framework”
  7. We then find WordNet synset id for its birth-cluster, say snew.
    Page 6, “Evaluation framework”
  8. Join: For the join case, we find WordNet synset ids 31 and 32 for the clusters obtained in the source time period and 3mm for the join cluster in the target time period.
    Page 7, “Evaluation framework”
  9. Split: For the split case, we find WordNet synset id 301d for the source cluster and synset ids 31 and 32 for the target split clusters.
    Page 7, “Evaluation framework”
  10. and the mapped WordNet synset matches the birth cluster to a very high degree.
    Page 8, “Evaluation framework”

See all papers in Proc. ACL 2014 that mention synset.

See all papers in Proc. ACL that mention synset.

Back to top.

manual evaluation

Appears in 6 sentences as: Manual evaluation (2) manual evaluation (4)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples.
    Page 1, “Abstract”
  2. 6.1 Manual evaluation
    Page 6, “Evaluation framework”
  3. The accuracy as per manual evaluation was found to be 60.4% for the birth cases and 57% for the split/join cases.
    Page 6, “Evaluation framework”
  4. correspond to the candidate words, words obtained in the cluster of each candidate word (we will use the term ‘birth cluster’ for these words, henceforth), which indicated a new sense, the results of manual evaluation as well as the possible sense this birth cluster denotes.
    Page 6, “Evaluation framework”
  5. In addition to manual evaluation , we also performed automated evaluation for the candidate words.
    Page 6, “Evaluation framework”
  6. Through manual evaluation we found that the algorithm could correctly identify 60.4% birth cases from a set of 48 random samples and 57% split/join cases from a set of 21 randomly picked samples.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention manual evaluation.

See all papers in Proc. ACL that mention manual evaluation.

Back to top.

sense disambiguation

Appears in 4 sentences as: sense disambiguation (4)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.
    Page 1, “Abstract”
  2. Two of the fundamental components of a natural language communication are word sense discovery (Jones, 1986) and word sense disambiguation (Ide and Veronis, 1998).
    Page 1, “Introduction”
  3. Word sense disambiguation as well as word sense discovery have both remained key areas of research right from the very early initiatives in natural language processing research.
    Page 2, “Related work”
  4. Ide and Vero-nis (1998) present a very concise survey of the history of ideas used in word sense disambiguation ; for a recent survey of the state-of-the-art one can refer to (Navigli, 2009).
    Page 2, “Related work”

See all papers in Proc. ACL 2014 that mention sense disambiguation.

See all papers in Proc. ACL that mention sense disambiguation.

Back to top.

co-occurrence

Appears in 3 sentences as: co-occurrence (3)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. In Section 3 we briefly describe the datasets and outline the process of co-occurrence graph construction.
    Page 2, “Introduction”
  2. We use the co-occurrence based graph clustering framework introduced in (Biemann, 2006).
    Page 3, “Tracking sense changes”
  3. Firstly, a co-occurrence graph is created for every target word found in DT.
    Page 3, “Tracking sense changes”

See all papers in Proc. ACL 2014 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

natural language

Appears in 3 sentences as: natural language (2) natural languages (1)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. Two of the fundamental components of a natural language communication are word sense discovery (Jones, 1986) and word sense disambiguation (Ide and Veronis, 1998).
    Page 1, “Introduction”
  2. two aspects are not only important from the perspective of developing computer applications for natural languages but also form the key components of language evolution and change.
    Page 1, “Introduction”
  3. Word sense disambiguation as well as word sense discovery have both remained key areas of research right from the very early initiatives in natural language processing research.
    Page 2, “Related work”

See all papers in Proc. ACL 2014 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

random samples

Appears in 3 sentences as: random samples (4)
In That's sick dude!: Automatic identification of word sense change across different timescales
  1. We selected 48 random samples of candidate words for birth cases and 21 random samples for split/join cases.
    Page 6, “Evaluation framework”
  2. A further analysis of the words marked due to birth in the random samples indicates that there are 22 technology-related words, 2 slangs, 3 economics related words and 2 general words.
    Page 6, “Evaluation framework”
  3. Through manual evaluation we found that the algorithm could correctly identify 60.4% birth cases from a set of 48 random samples and 57% split/join cases from a set of 21 randomly picked samples.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention random samples.

See all papers in Proc. ACL that mention random samples.

Back to top.