Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
Huang, Ruihong and Riloff, Ellen

Article Structure

Abstract

This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words.

Introduction

The goal of our research is to create semantic class taggers that can assign a semantic class label to every noun phrase in a sentence.

Related Work

Semantic class tagging is most closely related to named entity recognition (NER), mention detection, and semantic lexicon induction.

Topics

feature vectors

Appears in 6 sentences as: feature vector (2) feature vectors (4)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. However, when we create feature vectors for the classifier, the seeds themselves are hidden and only contextual features are used to represent each training instance.
    Page 3, “Related Work”
  2. We use an in-house sentence segmenter and NP chunker to identify the base NPs in each sentence and create feature vectors that represent each constituent in the sentence as either an NP or an individual word.
    Page 3, “Related Work”
  3. Two training instances would be created, with feature vectors that look like this, where M represents a modifier inside the target NP:
    Page 4, “Related Work”
  4. Our bootstrapping model incrementally trains semantic class taggers, so we explored the idea of using the labels assigned by the classifiers to create enhanced feature vectors by dynamically adding semantic features.
    Page 5, “Related Work”
  5. If “Vetsulin” was labeled as a DRUG in a previous bootstrapping iteration, then the feature vector representing the context around “doxy ” can be enhanced to include an additional semantic feature identifying Vetsulin as a DRUG, which would look like this:
    Page 5, “Related Work”
  6. When a feature vector is created for a target NP, we check every noun instance in its context window to see if it has been assigned a semantic tag, and if so, then we add a semantic feature.
    Page 6, “Related Work”

See all papers in Proc. ACL 2010 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

NER

Appears in 6 sentences as: NER (6)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. Semantic class tagging has been the subject of previous research, primarily under the guises of named entity recognition ( NER ) and mention detection.
    Page 1, “Introduction”
  2. Semantic class tagging is most closely related to named entity recognition ( NER ), mention detection, and semantic lexicon induction.
    Page 2, “Related Work”
  3. NER systems (e.g., (Bikel et al., 1997; Collins and Singer, 1999; Cucerzan and Yarowsky, 1999; Fleischman and Hovy, 2002) identify proper named entities, such as people, organizations, and locations.
    Page 2, “Related Work”
  4. Several bootstrapping methods for NER have been previously developed (e.g., (Collins and Singer, 1999; Niu et al., 2003)).
    Page 2, “Related Work”
  5. NER systems, however, do not identify nominal NP instances (e.g., “a software manufacturer” or “the beach”), or handle semantic classes that are not associated with proper named entities (e.g., symptoms).1 ACE
    Page 2, “Related Work”
  6. 1Some NER systems also handle specialized constructs such as dates and monetary amounts.
    Page 2, “Related Work”

See all papers in Proc. ACL 2010 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

WordNet

Appears in 6 sentences as: WordNet (7) Wordnet (1)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. These features are typically obtained from external resources such as Wordnet (Miller, 1990).
    Page 5, “Related Work”
  2. The first baseline searches for each head noun in WordNet and labels the noun as category Ck, if it has a hypernym synset corresponding to that category.
    Page 6, “Related Work”
  3. We manually identified the WordNet synsets that, to the best of our ability, seem to most closely correspond
    Page 6, “Related Work”
  4. We do not report WordNet results for TEST because there did not seem be an appropriate synset, or for the OTHER category because that is a catchall class.
    Page 7, “Related Work”
  5. The WordNet baseline yields low recall (21-32%) for every category except HUMAN, which confirms that many veterinary terms are not present in WordNet .
    Page 7, “Related Work”
  6. The surprisingly low precision for some categories is due to atypical word uses (e.g., patient, boy, and girl are HUMAN in WordNet but nearly always ANIMALS in our domain), and overgeneralities (e.g., WordNet lists calcium as a DRUG).
    Page 7, “Related Work”

See all papers in Proc. ACL 2010 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

named entities

Appears in 5 sentences as: named entities (2) Named entity (1) named entity (2)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. Semantic class tagging has been the subject of previous research, primarily under the guises of named entity recognition (NER) and mention detection.
    Page 1, “Introduction”
  2. Named entity recognizers perform semantic tagging on proper name noun phrases, and
    Page 1, “Introduction”
  3. Semantic class tagging is most closely related to named entity recognition (NER), mention detection, and semantic lexicon induction.
    Page 2, “Related Work”
  4. NER systems (e.g., (Bikel et al., 1997; Collins and Singer, 1999; Cucerzan and Yarowsky, 1999; Fleischman and Hovy, 2002) identify proper named entities , such as people, organizations, and locations.
    Page 2, “Related Work”
  5. NER systems, however, do not identify nominal NP instances (e.g., “a software manufacturer” or “the beach”), or handle semantic classes that are not associated with proper named entities (e.g., symptoms).1 ACE
    Page 2, “Related Work”

See all papers in Proc. ACL 2010 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

confidence score

Appears in 4 sentences as: (1) confidence score (3)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. We use a confidence score to label only the instances that the classifiers are most certain about.
    Page 4, “Related Work”
  2. We compute a confidence score for instance 2' with respect to semantic class Ck, by considering both the score of the Ck, classifier as well as the scores of the competing classifiers.
    Page 4, “Related Work”
  3. After each training cycle, all of the classifiers are applied to the remaining unlabeled instances and each classifier labels the (positive) instances that it is most confident about (i.e., the instances that it classifies with a confidence score 2 60f).
    Page 4, “Related Work”
  4. Specifically, gsc(w) = 0.l>x<w1:—/lc+0.9>x<ww"—1/tc where w; and mu are the # of labeled and unlabeled instances, respectively, mm is the # of instances labeled as c, and wu/C is the # of unlabeled instances that receive a positive confidence score for 0 when given to the classifier.
    Page 5, “Related Work”

See all papers in Proc. ACL 2010 that mention confidence score.

See all papers in Proc. ACL that mention confidence score.

Back to top.

feature set

Appears in 4 sentences as: feature set (3) feature sets (1)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. But, importantly, our classifiers all use the same feature set so they do not represent independent views of the data.
    Page 2, “Related Work”
  2. The feature set for these classifiers is exactly the same as described in Section 3.2, except that we add a new lexical feature that represents the head noun of the target NP (i.e., the NP that needs to be tagged).
    Page 5, “Related Work”
  3. 3But technically this is not co-training because our feature sets are all the same.
    Page 5, “Related Work”
  4. The feature set and classifier settings are exactly the same as with our bootstrapped classifiers.9 Supervised learning achieves good precision but low recall for all categories except ANIMAL and HUMAN.
    Page 7, “Related Work”

See all papers in Proc. ACL 2010 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

manually annotated

Appears in 4 sentences as: manual annotations (1) manually annotated (4)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. Nearly all semantic class taggers are trained using supervised learning with manually annotated data.
    Page 3, “Related Work”
  2. Each annotator then labeled an additional 35 documents, which gave us a test set containing 100 manually annotated message board posts.
    Page 6, “Related Work”
  3. With just fith of the training set, the system has about 1,600 message board posts to use for training, which yields a similar F score (roughly 61%) as the supervised baseline that used 100 manually annotated posts via 10-fold cross-validation.
    Page 8, “Related Work”
  4. Supervised learning exploits manually annotated data, but must make do with a relatively small amount of training text because manual annotations are expensive.
    Page 8, “Related Work”

See all papers in Proc. ACL 2010 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.

noun phrases

Appears in 3 sentences as: noun phrases (3)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. Named entity recognizers perform semantic tagging on proper name noun phrases , and
    Page 1, “Introduction”
  2. The mention detection task was introduced in recent ACE evaluations (e.g., (ACE, 2007; ACE, 2008)) and requires systems to identify all noun phrases (proper names, nominals, and pronouns) that correspond to 5-7 semantic classes.
    Page 1, “Introduction”
  3. We defined annotation guidelines6 for each semantic category and conducted an inter-annotator agreement study to measure the consistency of the two domain experts on 30 message board posts, which contained 1,473 noun phrases .
    Page 6, “Related Work”

See all papers in Proc. ACL 2010 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

synset

Appears in 3 sentences as: synset (2) synsets (1)
In Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
  1. The first baseline searches for each head noun in WordNet and labels the noun as category Ck, if it has a hypernym synset corresponding to that category.
    Page 6, “Related Work”
  2. We manually identified the WordNet synsets that, to the best of our ability, seem to most closely correspond
    Page 6, “Related Work”
  3. We do not report WordNet results for TEST because there did not seem be an appropriate synset , or for the OTHER category because that is a catchall class.
    Page 7, “Related Work”

See all papers in Proc. ACL 2010 that mention synset.

See all papers in Proc. ACL that mention synset.

Back to top.