Pattern Learning for Relation Extraction with a Hierarchical Topic Model
Alfonseca, Enrique and Filippova, Katja and Delort, Jean-Yves and Garrido, Guillermo

Article Structure

Abstract

We describe the use of a hierarchical topic model for automatically identifying syntactic and lexical patterns that explicitly state ontological relations.

Introduction

The detection of relations between entities for the automatic population of knowledge bases is very useful for solving tasks such as Entity Disambiguation, Information Retrieval and Question Answering.

Unsupervised relational pattern learning

Similar to other distant supervision methods, our approach takes as input an existing knowledge base containing entities and relations, and a textual corpus.

Experiments and results

Settings We use Freebase as our knowledge base.

Conclusions

We have described a new distant supervision model with which to learn patterns for relation extraction with no manual intervention.

Topics

knowledge base

Appears in 12 sentences as: knowledge base (11) knowledge bases (2)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. We leverage distant supervision using relations from the knowledge base FreeBase, but do not require any manual heuristic nor manual seed list selections.
    Page 1, “Abstract”
  2. The detection of relations between entities for the automatic population of knowledge bases is very useful for solving tasks such as Entity Disambiguation, Information Retrieval and Question Answering.
    Page 1, “Introduction”
  3. The availability of high-coverage, general-purpose knowledge bases enable the automatic identification and disambiguation of entities in text and its applications (Bunescu and Pasca, 2006; Cucerzan, 2007; McNamee and Dang, 2009; Kwok et al., 2001; Pasca et al., 2006; Weld et al., 2008; Pereira et al., 2009; Kasneci et al., 2009).
    Page 1, “Introduction”
  4. These systems do not need any manual data or rules, but the relational facts they extract are not immediately disambiguated to entities and relations from a knowledge base .
    Page 1, “Introduction”
  5. Similar to other distant supervision methods, our approach takes as input an existing knowledge base containing entities and relations, and a textual corpus.
    Page 2, “Unsupervised relational pattern learning”
  6. In this work it is not necessary for the corpus to be related to the knowledge base .
    Page 2, “Unsupervised relational pattern learning”
  7. In what follows we assume that all the relations studied are binary and hold between exactly two entities in the knowledge base .
    Page 2, “Unsupervised relational pattern learning”
  8. We also assume a dependency parser is available, and that the entities have been automatically disambiguated using the knowledge base as sense inventory.
    Page 2, “Unsupervised relational pattern learning”
  9. Specifically, if a sentence contains two entities, 67; and 63-, connected through a pattern w, our model computes the probability that the pattern is expressing any relation —P(r|w)— for any relation 7“ defined in the knowledge base .
    Page 2, “Unsupervised relational pattern learning”
  10. tities are disambiguated; (b) for each relation 7“ in the knowledge base, a new (initially empty) document collection CT is created; (c) for each entity pair (67;, 63-) which are related in the knowledge base , a new (initially empty) document Dij is created; ((1) for each sentence in the input corpus containing one mention of 67; and one mention of 63-, a new term is added to Dij consisting of the context in which the two entities were seen in the document.
    Page 2, “Unsupervised relational pattern learning”
  11. Each relation 7“ has associated a different document collection, which contains one document associated to each entity pair from the knowledge base which is in relation 7“.
    Page 3, “Unsupervised relational pattern learning”

See all papers in Proc. ACL 2012 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.

topic model

Appears in 9 sentences as: topic model (5) topic models (4)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. We describe the use of a hierarchical topic model for automatically identifying syntactic and lexical patterns that explicitly state ontological relations.
    Page 1, “Abstract”
  2. Instead, we use topic models to discriminate between the patterns that are expressing the relation and those that are ambiguous and can be applied across relations.
    Page 2, “Introduction”
  3. Note that we refer to patterns with the symbol w, as they are the words in our topic models .
    Page 2, “Unsupervised relational pattern learning”
  4. Document contain dependency patterns, which are words in the topic model .
    Page 2, “Unsupervised relational pattern learning”
  5. The topic model gbG captures general patterns that appear for all relations.
    Page 3, “Unsupervised relational pattern learning”
  6. gbA is the topic model of interest for us.
    Page 3, “Unsupervised relational pattern learning”
  7. A random sample of 3M of them is used for building the document collections on which to train the topic models , and the remaining 30M is used for testing.
    Page 3, “Experiments and results”
  8. In both cases, a topic model has been trained to learn the probability of a relation given a pattern w: p(r|w).
    Page 3, “Experiments and results”
  9. As can be seen, the MLE baselines (in red with syntactic patterns and green with intertext) perform consistently worse than the models learned using the topic models (in pink and blue).
    Page 4, “Experiments and results”

See all papers in Proc. ACL 2012 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

relation extraction

Appears in 6 sentences as: relation extraction (5) relations extracted (1)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. Open Information Extraction (Sekine, 2006; Banko et al., 2007; Bollegala et al., 2010) started as an effort to approach relation extraction in
    Page 1, “Introduction”
  2. A different family of unsupervised methods for relation extraction is unsupervised semantic parsing, which aims at clustering entity mentions and relation surface forms, thus generating a semantic representation of the texts on which inference may be used.
    Page 1, “Introduction”
  3. The main contribution of this work is presenting a variant of distance supervision for relation extraction where we do not use heuristics in the selection of the training data.
    Page 2, “Introduction”
  4. Figure 1: Example of a generated set of document collections from a news corpus for relation extraction .
    Page 2, “Unsupervised relational pattern learning”
  5. In the case of nationality, however, even though the extracted sentences do not support the relation (P@50 = 0.34 for intertext), the new relations extracted are mostly correct (P@50 = 0.86) as most presidents and ministers in the real world have the nationality of the country where they govern.
    Page 4, “Experiments and results”
  6. We have described a new distant supervision model with which to learn patterns for relation extraction with no manual intervention.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

distant supervision

Appears in 5 sentences as: distance supervision (1) distant supervision (4)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. We leverage distant supervision using relations from the knowledge base FreeBase, but do not require any manual heuristic nor manual seed list selections.
    Page 1, “Abstract”
  2. The main contribution of this work is presenting a variant of distance supervision for relation extraction where we do not use heuristics in the selection of the training data.
    Page 2, “Introduction”
  3. Similar to other distant supervision methods, our approach takes as input an existing knowledge base containing entities and relations, and a textual corpus.
    Page 2, “Unsupervised relational pattern learning”
  4. One of the most important problems to solve in distant supervision approaches is to be able to distinguish which of the textual examples that include two related entities, 67; and 63-, are supporting the relation.
    Page 2, “Unsupervised relational pattern learning”
  5. We have described a new distant supervision model with which to learn patterns for relation extraction with no manual intervention.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention distant supervision.

See all papers in Proc. ACL that mention distant supervision.

Back to top.

dependency path

Appears in 4 sentences as: dependency path (3) dependency paths (1)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. This context may be a complex structure, such as the dependency path joining the two entities, but it is considered for our purposes as a single term; (e) for each relation r relating 67; with 63-, document Dij is added to collection CT.
    Page 2, “Unsupervised relational pattern learning”
  2. The words in each document can be, for example, all the dependency paths that have been observed in the input textual corpus between the two related entities.
    Page 3, “Unsupervised relational pattern learning”
  3. Generative model Once these collections are built, we use the generative model from Figure 2 to learn the probability that a dependency path is conveying some relation between the entities it connects.
    Page 3, “Unsupervised relational pattern learning”
  4. Two ways of extracting patterns have been used: (a) Syntactic, taking the dependency path between the two entities, and (b) Intertext, taking the text between the two.
    Page 3, “Experiments and results”

See all papers in Proc. ACL 2012 that mention dependency path.

See all papers in Proc. ACL that mention dependency path.

Back to top.

entity mentions

Appears in 3 sentences as: entity mentions (3)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. A different family of unsupervised methods for relation extraction is unsupervised semantic parsing, which aims at clustering entity mentions and relation surface forms, thus generating a semantic representation of the texts on which inference may be used.
    Page 1, “Introduction”
  2. 2009; Hoffmann et al., 2011; Wang et al., 2011), or syntactic restrictions on the sentences and the entity mentions (Wu and Weld, 2010).
    Page 2, “Introduction”
  3. The corpus is preprocessed by identifying Freebase entity mentions , using an approach similar to (Milne and Witten, 2008), and parsing it with an inductive dependency parser (Nivre, 2006).
    Page 3, “Experiments and results”

See all papers in Proc. ACL 2012 that mention entity mentions.

See all papers in Proc. ACL that mention entity mentions.

Back to top.

generative model

Appears in 3 sentences as: Generative model (1) generative model (2) generative models (1)
In Pattern Learning for Relation Extraction with a Hierarchical Topic Model
  1. Some techniques that have been used are Markov Random Fields (Poon and Domingos, 2009) and Bayesian generative models (Titov and Klemen-tiev, 2011).
    Page 1, “Introduction”
  2. Figure 2: Plate diagram of the generative model used.
    Page 3, “Unsupervised relational pattern learning”
  3. Generative model Once these collections are built, we use the generative model from Figure 2 to learn the probability that a dependency path is conveying some relation between the entities it connects.
    Page 3, “Unsupervised relational pattern learning”

See all papers in Proc. ACL 2012 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.