An Active Learning Approach to Finding Related Terms
Vickrey, David and Kipersztok, Oscar and Koller, Daphne

Article Structure

Abstract

We present a novel system that helps non-experts find sets of similar words.

Introduction

Set expansion is a well-studied NLP problem where a machine-learning algorithm is given a fixed set of seed words and asked to find additional members of the implied set.

Set Expansion

As input, we are provided with a small set of seed words 5.

Topics

WordNet

Appears in 11 sentences as: WordNet (12)
In An Active Learning Approach to Finding Related Terms
  1. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet , contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.).
    Page 1, “Abstract”
  2. We consider three similarity data sources: the Moby thesaurus1 , WordNet (Fellbaum, 1998), and distributional similarity based on a large corpus of text (Lin, 1998).
    Page 3, “Set Expansion”
  3. Unlike some other thesauri (such as WordNet and thesaurus.com), entries are not broken down by word sense.
    Page 3, “Set Expansion”
  4. Unlike some other thesauri (including WordNet and thesaurus.com), the entries are not broken down by word sense or part of speech.
    Page 3, “Set Expansion”
  5. WordNet is a well-known dictionary/thesaurus/ ontology often used in NLP applications.
    Page 3, “Set Expansion”
  6. We focused on measuring similarity in WordNet using the hy-pernym hierarchy.2.
    Page 3, “Set Expansion”
  7. 2A useful similarity metric we did not explore in this paper is similarity between WordNet dictionary definitions
    Page 3, “Set Expansion”
  8. converting this hierarchy into a similarity score; we chose to use the J iang-Conrath distance (J iang & Conrath, 1997) because it tends to be more robust to the exact structure of WordNet .
    Page 3, “Set Expansion”
  9. The number of types of similarity in WordNet tends to be less than that captured by Moby, because synsets in WordNet are (usually) only allowed to have a single parent.
    Page 3, “Set Expansion”
  10. Second, the data sources used: each source separately (M for Moby, W for WordNet , D for distributional similarity), and all three in combination (MWD).
    Page 4, “Set Expansion”
  11. Hughes and Ramage (2007) use random walks over WordNet , incorporating information such as meronymy and dictionary glosses.
    Page 5, “Set Expansion”

See all papers in Proc. ACL 2010 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

word sense

Appears in 7 sentences as: word sense (5) word senses (3)
In An Active Learning Approach to Finding Related Terms
  1. Unlike some other thesauri (such as WordNet and thesaurus.com), entries are not broken down by word sense .
    Page 3, “Set Expansion”
  2. Unlike some other thesauri (including WordNet and thesaurus.com), the entries are not broken down by word sense or part of speech.
    Page 3, “Set Expansion”
  3. It consists of a large number of synsets; a synset is a set of one or more similar word senses .
    Page 3, “Set Expansion”
  4. The J iang-Conrath distance gives scores for pairs of word senses , not pairs of words.
    Page 3, “Set Expansion”
  5. We handle this by adding one row for every word sense with the right part of speech (rather than for every word); each row measures the similarity of every word to a particular word sense .
    Page 3, “Set Expansion”
  6. For the columns, we do need to collapse the word senses into words; for each word, we take a maximum across all of its senses.
    Page 3, “Set Expansion”
  7. Like Moby, similarity entries are not divided by word sense ; usually, only the dominant sense of each word is represented.
    Page 3, “Set Expansion”

See all papers in Proc. ACL 2010 that mention word sense.

See all papers in Proc. ACL that mention word sense.

Back to top.

logistic regression

Appears in 6 sentences as: logistic regression (6)
In An Active Learning Approach to Finding Related Terms
  1. The second method uses a standard machine learning algorithm, logistic regression .
    Page 2, “Set Expansion”
  2. Thus, logistic regression using Moby and no active learning is L(M,—).
    Page 4, “Set Expansion”
  3. For logistic regression , we set the regularization penalty 02 to 1, based on qualitative analysis during development (before seeing the test data).
    Page 4, “Set Expansion”
  4. Using logistic regression instead of the untrained weights significantly improves performance.
    Page 5, “Set Expansion”
  5. The gains from logistic regression and active learning are cumulative; L(MWD,+) outscores U(MWD,—) by 38%.
    Page 5, “Set Expansion”
  6. Also, we use a logistic regression model to robustly incorporate negative information, rather than deterministically ruling out words and features.
    Page 5, “Set Expansion”

See all papers in Proc. ACL 2010 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

distributional similarity

Appears in 3 sentences as: Distributional similarity (1) distributional similarity (2)
In An Active Learning Approach to Finding Related Terms
  1. We consider three similarity data sources: the Moby thesaurus1 , WordNet (Fellbaum, 1998), and distributional similarity based on a large corpus of text (Lin, 1998).
    Page 3, “Set Expansion”
  2. Distributional similarity .
    Page 3, “Set Expansion”
  3. Second, the data sources used: each source separately (M for Moby, W for WordNet, D for distributional similarity ), and all three in combination (MWD).
    Page 4, “Set Expansion”

See all papers in Proc. ACL 2010 that mention distributional similarity.

See all papers in Proc. ACL that mention distributional similarity.

Back to top.

significantly improves

Appears in 3 sentences as: significantly improve (1) significantly improves (2)
In An Active Learning Approach to Finding Related Terms
  1. Using logistic regression instead of the untrained weights significantly improves performance.
    Page 5, “Set Expansion”
  2. Using active learning also significantly improves performance: L(MWD,+) outscores L(MWD,—) by 13%.
    Page 5, “Set Expansion”
  3. We expect that adding these types of data would significantly improve our system.
    Page 5, “Set Expansion”

See all papers in Proc. ACL 2010 that mention significantly improves.

See all papers in Proc. ACL that mention significantly improves.

Back to top.

synsets

Appears in 3 sentences as: synset (1) synsets (3)
In An Active Learning Approach to Finding Related Terms
  1. It consists of a large number of synsets; a synset is a set of one or more similar word senses.
    Page 3, “Set Expansion”
  2. The synsets are then connected with hypemym/hyponym links, which represent ISA relationships.
    Page 3, “Set Expansion”
  3. The number of types of similarity in WordNet tends to be less than that captured by Moby, because synsets in WordNet are (usually) only allowed to have a single parent.
    Page 3, “Set Expansion”

See all papers in Proc. ACL 2010 that mention synsets.

See all papers in Proc. ACL that mention synsets.

Back to top.