Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
Kazama, Jun'ichi and Torisawa, Kentaro

Article Structure

Abstract

We propose using large-scale clustering of dependency relations between verbs and multi-word nouns (MN 5) to construct a gazetteer for named entity recognition (N ER).

Introduction

Gazetteers, or entity dictionaries, are important for performing named entity recognition (NER) accurately.

Gazetteer Induction 2.1 Induction by MN Clustering

Assume we have a probabilistic model of a multi-word noun (MN) and its class: p(n, c) = p(n|c)p(c), where n E N is an MN and c E C is a class.

Using Gazetteers as Features of NER

Since Japanese has no spaces between words, there are several choices for the token unit used in NER.

Experiments

4.1 Data

Comparison with Previous Studies

Since many previous studies on Japanese NER used 5-fold cross validation for the IREX dataset, we also performed it for some our models that had the best 02 found in the previous experiments.

Related Work and Discussion

There are several studies that used automatically extracted gazetteers for NER (Shinzato et al., 2006; Talukdar et al., 2006; Nadeau et al., 2006; Kazama and Torisawa, 2007).

Conclusion

We demonstrated that a gazetteer obtained by clustering verb-MN dependencies is a useful feature for a Japanese NER.

Topics

NER

Appears in 21 sentences as: NER (22)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. We demonstrated with the IREX dataset for the Japanese NER that using the constructed clusters as a gazetteer (cluster gazetteer) is a effective way of improving the accuracy of NER .
    Page 1, “Abstract”
  2. Moreover, we demonstrate that the combination of the cluster gazetteer and a gazetteer extracted from Wikipedia, which is also useful for NER , can further improve the accuracy in several cases.
    Page 1, “Abstract”
  3. Gazetteers, or entity dictionaries, are important for performing named entity recognition ( NER ) accurately.
    Page 1, “Introduction”
  4. Most studies using gazetteers for NER are based on the assumption that a gazetteer is a mapping
    Page 1, “Introduction”
  5. For instance, Kazama and Torisawa (2007) used the hyponymy relations extracted from Wikipedia for the English NER , and reported improved accuracies with such a gazetteer.
    Page 1, “Introduction”
  6. Therefore, performing the clustering with a vocabulary that is large enough to cover the many named entities required to improve the accuracy of NER is difficult.
    Page 2, “Introduction”
  7. Kazama and Torisawa (2007) extracted hyponymy relations from the first sentences (i.e., defining sentences) of Wikipedia articles and then used them as a gazetteer for NER .
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  8. Although this Wikipedia gazetteer is much smaller than the English version used by Kazama and Torisawa (2007) that has over 2,000,000 entries, it is the largest gazetteer that can be freely used for Japanese NER .
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  9. Our experimental results show that this Wikipedia gazetteer can be used to improve the accuracy of Japanese NER .
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  10. Since Japanese has no spaces between words, there are several choices for the token unit used in NER .
    Page 4, “Using Gazetteers as Features of NER”
  11. The NER task is then treated as a tagging task, which assigns IOB tags to each character in a sentence.10 We use Conditional Random Fields (CRFs) (Lafferty et al., 2001) to perform this tagging.
    Page 4, “Using Gazetteers as Features of NER”

See all papers in Proc. ACL 2008 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

named entity

Appears in 9 sentences as: named entities (4) named entity (5)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. We propose using large-scale clustering of dependency relations between verbs and multi-word nouns (MN 5) to construct a gazetteer for named entity recognition (N ER).
    Page 1, “Abstract”
  2. Gazetteers, or entity dictionaries, are important for performing named entity recognition (NER) accurately.
    Page 1, “Introduction”
  3. from a multi-word noun (MN)1 to named entity categories such as “Tokyo Stock Exchange —> {ORGANIZATION}”.2 However, since the correspondence between the labels and the NE categories can be learned by tagging models, a gazetteer will be useful as long as it returns consistent labels even if those returned are not the NE categories.
    Page 1, “Introduction”
  4. Therefore, performing the clustering with a vocabulary that is large enough to cover the many named entities required to improve the accuracy of NER is difficult.
    Page 2, “Introduction”
  5. Figure 2: Clean MN clusters with named entity entries (Left: car brand names.
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  6. We obtained 11,892 sentences13 with 18,677 named entities .
    Page 5, “Experiments”
  7. We define “# e-matches” as the number of matches that also match a boundary of a named entity in the training set, and “# optimal” as the optimal number of “# e-matches” that can be achieved when we know the
    Page 6, “Experiments”
  8. These gazetteers cover 40 - 50% of the named entities , and the cluster gazetteers have relatively wider coverage than the Wikipedia gazetteer has.
    Page 6, “Experiments”
  9. To cover most of the named entities in the data, we need much larger gazetteers.
    Page 8, “Related Work and Discussion”

See all papers in Proc. ACL 2008 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

hypernym

Appears in 7 sentences as: hypernym (5) hypernyms (3)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. The last word in the noun phase is then extracted and becomes the hypernym of the entity described by the article.
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  2. For example, from the following defining sentence, it extracts “guitarist” as the hypernym for “J imi Hendrix”.
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  3. # instances page titles processed 550,832 articles found 547,779 (found by redirection) (189,222) first sentences found 545,577 hypernyms extracted 482,599
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  4. construct a gazetteer that maps an MN (a title of a Wikipedia article) to its hypernym.8 When the hypernym extraction failed, a special hypernym symbol, e.g., “UNK”, was used.
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  5. From the Japanese Wikipedia entries of April 10, 2007, we extracted 550,832 gazetteer entries (482,599 entries have hypernyms other than UNK).
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  6. The number of distinct hypernyms in the gazetteer was 12,786.
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  7. 8They handled “redirections” as well by following redirection links and extracting a hypernym from the article reached.
    Page 4, “Using Gazetteers as Features of NER”

See all papers in Proc. ACL 2008 that mention hypernym.

See all papers in Proc. ACL that mention hypernym.

Back to top.

dependency relations

Appears in 5 sentences as: Dependency Relations (2) dependency relations (4)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. We propose using large-scale clustering of dependency relations between verbs and multi-word nouns (MN 5) to construct a gazetteer for named entity recognition (N ER).
    Page 1, “Abstract”
  2. Since dependency relations capture the semantics of MN 5 well, the MN clusters constructed by using dependency relations should serve as a good gazetteer.
    Page 1, “Abstract”
  3. g of Dependency Relations
    Page 1, “Introduction”
  4. 2.2 EM-based Clustering using Dependency Relations
    Page 2, “Gazetteer Induction 2.1 Induction by MN Clustering”
  5. By paralleliz-ing the clustering algorithm, we successfully constructed a cluster gazetteer with up to 500,000 entries from a large amount of dependency relations in Web documents.
    Page 8, “Related Work and Discussion”

See all papers in Proc. ACL 2008 that mention dependency relations.

See all papers in Proc. ACL that mention dependency relations.

Back to top.

CRFs

Appears in 4 sentences as: CRFs (4)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. We expect that tagging models ( CRFs in our case) can learn an appropriate weight for each gazetteer match regardless of whether it is an NE or not.
    Page 2, “Gazetteer Induction 2.1 Induction by MN Clustering”
  2. The NER task is then treated as a tagging task, which assigns IOB tags to each character in a sentence.10 We use Conditional Random Fields ( CRFs ) (Lafferty et al., 2001) to perform this tagging.
    Page 4, “Using Gazetteers as Features of NER”
  3. We think this shows one of the strengths of machine learning methods such as CRFs .
    Page 6, “Experiments”
  4. Using models such as Semi-Markov CRFs (Sarawagi and Cohen, 2004), which handle the features on overlapping regions, is one possible direction.
    Page 8, “Related Work and Discussion”

See all papers in Proc. ACL 2008 that mention CRFs.

See all papers in Proc. ACL that mention CRFs.

Back to top.

CRF

Appears in 3 sentences as: CRF (3) CRF++ (1)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. These annotated IOB tags can be used in the same way as other features in a CRF tagger.
    Page 5, “Using Gazetteers as Features of NER”
  2. We generated the node and the edge features of a CRF model as described in Table 3 using these atomic features.
    Page 5, “Experiments”
  3. To train CRF models, we used Taku Kudo’s CRF++ (ver.
    Page 5, “Experiments”

See all papers in Proc. ACL 2008 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

machine learning

Appears in 3 sentences as: machine learning (3)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. We think this shows one of the strengths of machine learning methods such as CRFs.
    Page 6, “Experiments”
  2. Parallelization has recently regained attention in the machine learning community because of the need for learning from very large sets of data.
    Page 8, “Related Work and Discussion”
  3. (2006) presented the MapReduce framework for a wide range of machine learning algorithms, including the EM algorithm.
    Page 8, “Related Work and Discussion”

See all papers in Proc. ACL 2008 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

morphological analyzer

Appears in 3 sentences as: morphological analysis (1) morphological analyzer (2)
In Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
  1. After preprocessing the first sentence of an article using a morphological analyzer , MeCab9, we extracted the last noun after the appearance of Japanese postpo-sition “Oi (wa)” (% “is”).
    Page 4, “Gazetteer Induction 2.1 Induction by MN Clustering”
  2. Asahara and Motsumoto (2003) proposed using characters instead of morphemes as the unit to alleviate the effect of segmentation errors in morphological analysis and we also used their character-based method.
    Page 4, “Using Gazetteers as Features of NER”
  3. We used MeCab as a morphological analyzer and CaboCha14 (Kudo and Matsumoto, 2002) as the dependency parser to find the boundaries of the bunsetsu.
    Page 5, “Experiments”

See all papers in Proc. ACL 2008 that mention morphological analyzer.

See all papers in Proc. ACL that mention morphological analyzer.

Back to top.