Fine-grained Semantic Typing of Emerging Entities
Nakashole, Ndapandula and Tylenda, Tomasz and Weikum, Gerhard

Article Structure

Abstract

Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied.

Introduction

A large number of knowledge base (KB) construction projects have recently emerged.

Detection of New Entities

To detect noun phrases that potentially refer to entities, we apply a part-of-speech tagger to the input text.

Typing Emerging Entities

To deduce types for new entities we propose to align new entities along the type signatures of patterns they occur With.

Candidate Types for Entities

For a given entity, candidate types are types that can potentially be assigned to that entity, based on

Evaluation

To define a suitable corpus of test data, we obtained a stream of news documents by subscribing to Google News RSS feeds for a few topics over a six-month period (April 2012 — September 2012).

Related Work

Tagging mentions of named entities with lexical types has been pursued in previous work.

Conclusion

This paper addressed the problem of detecting and semantically typing newly emerging entities, to support the life-cycle of large knowledge bases.

Topics

ILP

Appears in 18 sentences as: ILP (17) ILPs (2) ILP’s (1)
In Fine-grained Semantic Typing of Emerging Entities
  1. Our solution is formalized as an Integer Linear Program ( ILP ).
    Page 5, “Candidate Types for Entities”
  2. In the following we develop two variants of this approach: a “hard” ILP with rigorous disj ointness constraints, and a “soft” ILP which considers type correlations.
    Page 5, “Candidate Types for Entities”
  3. “Hard” ILP with Type Disjointness Constraints.
    Page 5, “Candidate Types for Entities”
  4. The ILP is defined as follows:
    Page 5, “Candidate Types for Entities”
  5. “Soft” ILP with Type Correlations.
    Page 5, “Candidate Types for Entities”
  6. Our second ILP considers such soft constraints.
    Page 5, “Candidate Types for Entities”
  7. These values ’Uij E [—1, l] are used as weights in the objective function of the ILP .
    Page 5, “Candidate Types for Entities”
  8. The ILP with correlations is defined as follows:
    Page 5, “Candidate Types for Entities”
  9. Note that both ILP variants need to be solved per entity, not over all entities together.
    Page 5, “Candidate Types for Entities”
  10. The “soft” ILP has a size quadratic in the number of candidate types, but this is still a tractable input for modern solvers.
    Page 5, “Candidate Types for Entities”
  11. We use the Gurobi software package to compute the solutions for the ILP’s .
    Page 5, “Candidate Types for Entities”

See all papers in Proc. ACL 2013 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

fine-grained

Appears in 7 sentences as: fine-grained (7)
In Fine-grained Semantic Typing of Emerging Entities
  1. Our aim is for all recognized and newly discovered entities to be semantically interpretable by having fine-grained types that connect them to KB classes.
    Page 1, “Introduction”
  2. For informative knowledge, new entities must be typed in a fine-grained manner (e.g., guitar player, blues band, concert, as opposed to crude types like person, organization, event).
    Page 1, “Introduction”
  3. Therefore, our setting resembles the established task of fine-grained typing for noun phrases (Fleis-chmann 2002), with the difference being that we disregard common nouns and phrases for prominent in-KB entities and instead exclusively focus on the difficult case of phrases that likely denote new entities.
    Page 2, “Introduction”
  4. In summary, our contribution in this paper is a model for discovering and ontologically typing out-of-KB entities, using a fine-grained type system and harnessing relational paraphrases with type signatures for probabilistic weight computa-
    Page 2, “Introduction”
  5. HYENA (Hierarchical tYpe classification for Entity NAmes), the method of (Yosef 2012), based on a feature-rich classifier for fine-grained , hierarchical type tagging.
    Page 6, “Evaluation”
  6. There is fairly little work on fine-grained typing, notable results being (Fleischmann 2002; Rahman 2010; Ling 2012; Yosef 2012).
    Page 7, “Related Work”
  7. These methods consider type taxonomies similar to the one used for PEARL, consisting of several hundreds of fine-grained types.
    Page 7, “Related Work”

See all papers in Proc. ACL 2013 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

knowledge base

Appears in 7 sentences as: knowledge base (6) knowledge bases (1)
In Fine-grained Semantic Typing of Emerging Entities
  1. Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied.
    Page 1, “Abstract”
  2. A large number of knowledge base (KB) construction projects have recently emerged.
    Page 1, “Introduction”
  3. [9) The noun phrase is a known entity that can be directly mapped to the knowledge base .
    Page 2, “Detection of New Entities”
  4. d) The noun phrase is a new entity not known to the knowledge base at all.
    Page 2, “Detection of New Entities”
  5. To decide if a noun phrase is a true entity (i.e., an individual entity that is a member of one or more lexical classes) or a nonentity (i.e., a common noun phrase that denotes a class or a general concept), we base the decision on the following hypothesis (inspired by and generalizing (Bunescu 2006): A given noun phrase, not known to the knowledge base , is a true entity if its headword is singular and is consistently capitalized (i.e., always spelled with the first letter in upper case).
    Page 2, “Detection of New Entities”
  6. We infer type disjointness constraints from the YAG02 knowledge base using occurrence statistics.
    Page 5, “Candidate Types for Entities”
  7. This paper addressed the problem of detecting and semantically typing newly emerging entities, to support the life-cycle of large knowledge bases .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.

noun phrases

Appears in 7 sentences as: Noun phrases (1) noun phrases (6)
In Fine-grained Semantic Typing of Emerging Entities
  1. However, state-of-the-art open IE methods extract all noun phrases that are likely to denote entities.
    Page 1, “Introduction”
  2. ture are typed noun phrases .
    Page 2, “Introduction”
  3. Therefore, our setting resembles the established task of fine-grained typing for noun phrases (Fleis-chmann 2002), with the difference being that we disregard common nouns and phrases for prominent in-KB entities and instead exclusively focus on the difficult case of phrases that likely denote new entities.
    Page 2, “Introduction”
  4. When extracting noun phrases , PEARL also collects the co-occurring PATTY phrases.
    Page 2, “Introduction”
  5. To detect noun phrases that potentially refer to entities, we apply a part-of-speech tagger to the input text.
    Page 2, “Detection of New Entities”
  6. Most well-known is the Stanford named entity recognition (NER) tagger (Finkel 2005) which assigns coarse-grained types like person, organization, location, and other to noun phrases that are likely to denote entities.
    Page 7, “Related Work”
  7. Noun phrases in the subject role in a large collection of fact triples are heuristically linked to Freebase entities.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

turkers

Appears in 7 sentences as: turkers (7)
In Fine-grained Semantic Typing of Emerging Entities
  1. To evaluate the quality of types assigned to emerging entities, we presented turkers with sentences from the news tagged with out-of-KB entities and the types inferred by the methods under test.
    Page 6, “Evaluation”
  2. The turkers task was to assess the correctness of types assigned to an entity mention.
    Page 6, “Evaluation”
  3. To make it easy to understand the task for the turkers , we combined the extracted entity and type into a sentence.
    Page 6, “Evaluation”
  4. Per method, turkers evaluated 105 entity-type pair test samples.
    Page 6, “Evaluation”
  5. Each test sample was given to 3 different turkers for assessment.
    Page 6, “Evaluation”
  6. Since the turkers did not always agree if the type for a sample is good or not, we aggregate their answers.
    Page 6, “Evaluation”
  7. Nevertheless, the observed Fleiss K: values show that the task was fairly clear to the turkers ; values > 0.2 are generally considered as acceptable (Landis 1977).
    Page 7, “Evaluation”

See all papers in Proc. ACL 2013 that mention turkers.

See all papers in Proc. ACL that mention turkers.

Back to top.

Linear Program

Appears in 4 sentences as: Linear Program (2) linear program (1) linear programs (1)
In Fine-grained Semantic Typing of Emerging Entities
  1. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disj ointness constraints.
    Page 1, “Abstract”
  2. For cleaning out false hypotheses among the type candidates for a new entity, we devised probabilistic models and an integer linear program that considers incompatibilities and correlations among entity types.
    Page 2, “Introduction”
  3. 4.3 Integer Linear Program Formulation
    Page 5, “Candidate Types for Entities”
  4. Our solution is formalized as an Integer Linear Program (ILP).
    Page 5, “Candidate Types for Entities”

See all papers in Proc. ACL 2013 that mention Linear Program.

See all papers in Proc. ACL that mention Linear Program.

Back to top.

named entity

Appears in 4 sentences as: named entities (1) named entity (3)
In Fine-grained Semantic Typing of Emerging Entities
  1. Known entities are recognized and mapped to the KB using a recent tool for named entity disambiguation (Hoffart 2011).
    Page 2, “Introduction”
  2. The problem in this example is incorrect segmentation of a named entity .
    Page 6, “Evaluation”
  3. Tagging mentions of named entities with lexical types has been pursued in previous work.
    Page 7, “Related Work”
  4. Most well-known is the Stanford named entity recognition (NER) tagger (Finkel 2005) which assigns coarse-grained types like person, organization, location, and other to noun phrases that are likely to denote entities.
    Page 7, “Related Work”

See all papers in Proc. ACL 2013 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

part-of-speech

Appears in 3 sentences as: part-of-speech (3)
In Fine-grained Semantic Typing of Emerging Entities
  1. To detect noun phrases that potentially refer to entities, we apply a part-of-speech tagger to the input text.
    Page 2, “Detection of New Entities”
  2. HYENA’s relatively poor performance can be attributed to the fact that its features are mainly syntactic such as bi-grams and part-of-speech tags.
    Page 6, “Evaluation”
  3. All methods use trained classifiers over a variety of linguistic features, most importantly, words and bigrams with part-of-speech tags in a mention and in the textual context preceding and following the mention.
    Page 7, “Related Work”

See all papers in Proc. ACL 2013 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

part-of-speech tags

Appears in 3 sentences as: part-of-speech tagger (1) part-of-speech tags (2)
In Fine-grained Semantic Typing of Emerging Entities
  1. To detect noun phrases that potentially refer to entities, we apply a part-of-speech tagger to the input text.
    Page 2, “Detection of New Entities”
  2. HYENA’s relatively poor performance can be attributed to the fact that its features are mainly syntactic such as bi-grams and part-of-speech tags .
    Page 6, “Evaluation”
  3. All methods use trained classifiers over a variety of linguistic features, most importantly, words and bigrams with part-of-speech tags in a mention and in the textual context preceding and following the mention.
    Page 7, “Related Work”

See all papers in Proc. ACL 2013 that mention part-of-speech tags.

See all papers in Proc. ACL that mention part-of-speech tags.

Back to top.

probabilistic model

Appears in 3 sentences as: probabilistic model (2) probabilistic models (1)
In Fine-grained Semantic Typing of Emerging Entities
  1. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disj ointness constraints.
    Page 1, “Abstract”
  2. For cleaning out false hypotheses among the type candidates for a new entity, we devised probabilistic models and an integer linear program that considers incompatibilities and correlations among entity types.
    Page 2, “Introduction”
  3. The second method is PEARL with no ILP (denoted No ILP), only using the probabilistic model .
    Page 7, “Evaluation”

See all papers in Proc. ACL 2013 that mention probabilistic model.

See all papers in Proc. ACL that mention probabilistic model.

Back to top.