Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
Hoffmann, Raphael and Zhang, Congle and Ling, Xiao and Zettlemoyer, Luke and Weld, Daniel S.

Article Structure

Abstract

Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text.

Introduction

Information-extraction (IE), the process of generating relational data from natural-language text, continues to gain attention.

Weak Supervision from a Database

Given a corpus of text, we seek to extract facts about entities, such as the company Apple or the city Boston.

Modeling Overlapping Relations

We define an undirected graphical model that allows joint reasoning about aggregate (corpus-level) and sentence-level extraction decisions.

Learning

We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Z,- as latent, and uses facts from a database (6. g., Freebase) as supervision for the aggregate-level variables Y7".

Inference

To support learning, as described above, we need to compute assignments arg maxZ p(z|x, y; 6) and arg maxy,Z p(y, z|x; 6).

Experimental Setup

We follow the approach of Riedel et al.

Experiments

To evaluate our algorithm, we first compare it to an existing approach for using multi-instance learning with weak supervision (Riedel et al., 2010), using the same data and features.

Related Work

Supervised-learning approaches to IE were introduced in (Soderland et al., 1995) and are too numerous to summarize here.

Conclusion

We argue that weak supervision is promising method for scaling information extraction to the level where it can handle the myriad, different relations on the Web.

Topics

sentence-level

Appears in 17 sentences as: (1) sentence-level (16)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. This paper presents a novel approach for multi—instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts.
    Page 1, “Abstract”
  2. 0 MULTIR also produces accurate sentence-level predictions, decoding individual sentences as well as making corpus-level extractions.
    Page 2, “Introduction”
  3. In contrast, sentence-level extraction must justify each extraction with every sentence which expresses the fact.
    Page 2, “Weak Supervision from a Database”
  4. We define an undirected graphical model that allows joint reasoning about aggregate (corpus-level) and sentence-level extraction decisions.
    Page 2, “Modeling Overlapping Relations”
  5. 2, should be assigned a value 7“ E R only when :0, expresses the ground fact r(e), thereby modeling sentence-level extraction.
    Page 3, “Modeling Overlapping Relations”
  6. (2009) sentence-level features in the ex-peiments, as described in Section 7.
    Page 3, “Modeling Overlapping Relations”
  7. This model was designed to provide a joint approach where extraction decisions are almost entirely driven by sentence-level reasoning.
    Page 3, “Modeling Overlapping Relations”
  8. However, defining the Y7" random variables and tying them to the sentence-level variables, 2,, provides a direct method for modeling weak supervision.
    Page 3, “Modeling Overlapping Relations”
  9. (2010), in that both models include sentence-level and aggregate random variables.
    Page 3, “Modeling Overlapping Relations”
  10. We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Z,- as latent, and uses facts from a database (6. g., Freebase) as supervision for the aggregate-level variables Y7".
    Page 4, “Learning”
  11. It is thus sufficient to independently compute an assignment for each sentence-level extraction variable 2,, ignoring the deterministic dependencies.
    Page 5, “Inference”

See all papers in Proc. ACL 2011 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

entity mentions

Appears in 10 sentences as: entities mentioned (3) entity mention (1) entity mentions (6)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. An entity mention is a contiguous sequence of textual tokens denoting an entity.
    Page 2, “Weak Supervision from a Database”
  2. In this paper we assume that there is an oracle which can identify all entity mentions in a corpus, but the oracle doesn’t normalize or disambiguate these mentions.
    Page 2, “Weak Supervision from a Database”
  3. A relation mention is a sequence of text (including one or more entity mentions ) which states that some ground fact r(e) is true.
    Page 2, “Weak Supervision from a Database”
  4. at CES contains three entity mentions as well as a relation mention for CEO—of (Steve Ballmer, Microsoft) .
    Page 2, “Weak Supervision from a Database”
  5. Furthermore, we assume that both entity mentions appear as noun phrases in a single sentence.
    Page 2, “Weak Supervision from a Database”
  6. The knowledge-based weakly supervised learning problem takes as input (1) E, a training corpus, (2) E, a set of entities mentioned in that corpus, (3) R, a set of relation names, and (4), A, a set of ground facts of relations in R. As output the learner produces an extraction model.
    Page 2, “Weak Supervision from a Database”
  7. (2) E, a set of entities mentioned in the sentences, (3) R, a set of relation names, and
    Page 4, “Modeling Overlapping Relations”
  8. As input we have (1) E, a set of sentences, (2) E, a set of entities mentioned in the sentences, (3) R, a set of relation names, and (4) A, a database of atomic facts of the form 7“(€1, 62) for 7“ E R and 6,- E E. Since we are using weak learning, the Y7" variables in Y are not directly observed, but can be approximated from the database A.
    Page 4, “Learning”
  9. The data was first tagged with the Stanford NER system (Finkel et al., 2005) and then entity mentions were found by collecting each continuous phrase where words were tagged identically (i.e., as a person, location, or organization).
    Page 5, “Experimental Setup”
  10. These include indicators for various lexical, part of speech, named entity, and dependency tree path properties of entity mentions in specific sentences, as computed with the Malt dependency parser (Nivre and Nilsson, 2004) and OpenNLP POS taggerl.
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2011 that mention entity mentions.

See all papers in Proc. ACL that mention entity mentions.

Back to top.

precision and recall

Appears in 6 sentences as: precision and recall (6)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. use supervised learning of relation-specific examples, which can achieve high precision and recall .
    Page 1, “Introduction”
  2. Aggregate Extraction Let A6 be the set of extracted relations for any of the systems; we compute aggregate precision and recall by comparing A6 with A.
    Page 6, “Experimental Setup”
  3. We then report precision and recall for each system on this set of sampled sentences.
    Page 6, “Experimental Setup”
  4. Since the data contains an unbalanced number of instances of each relation, we also report precision and recall for each of the ten most frequent relations.
    Page 7, “Experiments”
  5. Table 1 presents this approximate precision and recall for MULTIR on each of the relations, along with statistics we computed to measure the quality of the weak supervision.
    Page 7, “Experiments”
  6. While they offer high precision and recall , these methods are unlikely to scale to the thousands of relations found in text on the Web.
    Page 8, “Related Work”

See all papers in Proc. ACL 2011 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

learning algorithms

Appears in 4 sentences as: Learning Algorithm (1) learning algorithm (1) learning algorithms (2)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. Recently, researchers have developed multi-instance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO—of (Jobs, Apple).
    Page 1, “Abstract”
  2. Figure 2: The MULTIR Learning Algorithm
    Page 4, “Modeling Overlapping Relations”
  3. We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Z,- as latent, and uses facts from a database (6. g., Freebase) as supervision for the aggregate-level variables Y7".
    Page 4, “Learning”
  4. Since the processs of matching database tuples to sentences is inherently heuristic, researchers have proposed multi-instance learning algorithms as a means for coping with the resulting noisy data.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

relational extractors

Appears in 4 sentences as: relation extractors (1) relational extraction (1) relational extractors (2)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors .
    Page 1, “Abstract”
  2. (2009) used Freebase facts to train 100 relational extractors on Wikipedia.
    Page 9, “Related Work”
  3. Bunescu and Mooney (2007) connect weak supervision with multi-instance learning and extend their relational extraction kernel to this context.
    Page 9, “Related Work”
  4. automatically learn a nearly unbounded number of relational extractors .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention relational extractors.

See all papers in Proc. ACL that mention relational extractors.

Back to top.

F1 score

Appears in 3 sentences as: F1 score (3)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. At the highest recall point, MULTIR reaches 72.4% precision and 51.9% recall, for an F1 score of 60.5%.
    Page 7, “Experiments”
  2. On average across relations, precision increases 12 points but recall drops 26 points, for an overall reduction in F1 score from 60.5% to 40.3%.
    Page 7, “Experiments”
  3. (2010) describe a system similar to KYLIN, but which dynamically generates lexicons in order to handle sparse data, learning over 5000 Infobox relations with an average F1 score of 61%.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention F1 score.

See all papers in Proc. ACL that mention F1 score.

Back to top.

graphical model

Appears in 3 sentences as: graphical model (3)
In Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
  1. o MULTIR introduces a probabilistic, graphical model of multi-instance learning which handles overlapping relations.
    Page 2, “Introduction”
  2. We define an undirected graphical model that allows joint reasoning about aggregate (corpus-level) and sentence-level extraction decisions.
    Page 2, “Modeling Overlapping Relations”
  3. (2010), combine weak supervision and multi-instance learning in a more sophisticated manner, training a graphical model , which assumes only that at least one of the matches between the arguments of a Freebase fact and sentences in the corpus is a true relational mention.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention graphical model.

See all papers in Proc. ACL that mention graphical model.

Back to top.