Distant supervision for relation extraction without labeled data
Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel

Article Structure

Abstract

Modem models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora.

Introduction

At least three learning paradigms have been applied to the task of extracting relational facts from text (for example, learning that a person is employed by a particular organization, or that a geographic entity is located in a particular region).

Previous work

Except for the unsupervised algorithms discussed above, previous supervised or bootstrapping approaches to relation extraction have typically relied on relatively small datasets, or on only a small number of distinct relations.

Freebase

Following the literature, we use the term ‘relation’ to refer to an ordered, binary relation between entities.

Architecture

The intuition of our distant supervision approach is to use Freebase to give us a training set of relations and entity pairs that participate in those relations.

Features

Our features are based on standard lexical and syntactic features from the literature.

Implementation

6.1 Text

Evaluation

We evaluate labels in two ways: automatically, by holding out part of the Freebase relation data during training, and comparing newly discovered relation instances against this held-out data, and manually, having humans who look at each posi-

Discussion

Our results show that the distant supervision algorithm is able to extract high-precision patterns for a reasonably large number of relations.

Topics

relation instances

Appears in 13 sentences as: relation instance (2) relation instances (10) ‘relation instances’ (1)
In Distant supervision for relation extraction without labeled data
  1. The NIST Automatic Content Extraction (ACE) RDC 2003 and 2004 corpora, for example, include over 1,000 documents in which pairs of entities have been labeled with 5 to 7 major relation types and 23 to 24 subrelations, totaling 16,771 relation instances .
    Page 1, “Introduction”
  2. Thus whereas the supervised training paradigm uses a small labeled corpus of only 17,000 relation instances as training data, our algorithm can use much larger amounts of data: more text, more relations, and more instances.
    Page 2, “Introduction”
  3. Table 1 shows examples of relation instances extracted by our system.
    Page 2, “Introduction”
  4. Approaches based on WordNet have often only looked at the hypernym (isa) or meronym (part-of) relation (Girju et al., 2003; Snow et al., 2005), while those based on the ACE program (Doddington et al., 2004) have been restricted in their evaluation to a small number of relation instances and corpora of less than a million words.
    Page 2, “Previous work”
  5. We refer to individual ordered pairs in this relation as ‘relation instances’ .
    Page 3, “Freebase”
  6. We use relations and relation instances from Freebase, a freely available online database of structured semantic data.
    Page 3, “Freebase”
  7. This time, every pair of entities appearing together in a sentence is considered a potential relation instance , and whenever those entities appear together, features are extracted on the sentence and added to a feature vector for that entity pair.
    Page 4, “Architecture”
  8. This means that 900,000 Freebase relation instances are used in training, and 900,000 are held out.
    Page 6, “Implementation”
  9. For human evaluation experiments, all 1.8 million relation instances are used in training.
    Page 6, “Implementation”
  10. For all our experiments, we only extract relation instances that do not appear in our training data, i.e., instances that are not already in Freebase.
    Page 6, “Implementation”
  11. Once all of the entity pairs discovered during testing have been classified, they can be ranked by confidence score and used to generate a list of the n most likely new relation instances .
    Page 7, “Implementation”

See all papers in Proc. ACL 2009 that mention relation instances.

See all papers in Proc. ACL that mention relation instances.

Back to top.

distant supervision

Appears in 8 sentences as: Distant supervision (1) distant supervision (7)
In Distant supervision for relation extraction without labeled data
  1. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision .
    Page 1, “Abstract”
  2. Distant supervision is an extension of the paradigm used by Snow et al.
    Page 1, “Introduction”
  3. Our algorithm uses Freebase (Bollacker et al., 2008), a large semantic database, to provide distant supervision for relation extraction.
    Page 2, “Introduction”
  4. The intuition of distant supervision is that any sentence that contains a pair of entities that participate in a known Freebase relation is likely to express that relation in some way.
    Page 2, “Introduction”
  5. Perhaps most similar to our distant supervision algorithm is the effective method of Wu and Weld (2007) who extract relations from a Wikipedia page by using supervision from the page’s infobox.
    Page 3, “Previous work”
  6. The intuition of our distant supervision approach is to use Freebase to give us a training set of relations and entity pairs that participate in those relations.
    Page 3, “Architecture”
  7. The distant supervision assumption is that if two entities participate in a relation, any sentence that contain those two entities might express that relation.
    Page 4, “Architecture”
  8. Our results show that the distant supervision algorithm is able to extract high-precision patterns for a reasonably large number of relations.
    Page 8, “Discussion”

See all papers in Proc. ACL 2009 that mention distant supervision.

See all papers in Proc. ACL that mention distant supervision.

Back to top.

named entity

Appears in 8 sentences as: Named entity (1) named entity (8)
In Distant supervision for relation extraction without labeled data
  1. For example, the DIPRE algorithm by Brin (1998) used string-based regular expressions in order to recognize relations such as author-book, while the SNOWBALL algorithm by Agichtein and Gravano (2000) learned similar regular expression patterns over words and named entity tags.
    Page 2, “Previous work”
  2. in sentences using a named entity tagger that labels persons, organizations and locations.
    Page 4, “Architecture”
  3. In the testing step, entities are again identified using the named entity tagger.
    Page 4, “Architecture”
  4. 5.3 Named entity tag features
    Page 5, “Features”
  5. Every feature contains, in addition to the content described above, named entity tags for the two entities.
    Page 5, “Features”
  6. We perform named entity tagging using the Stanford four-class named entity tagger (Finkel et al., 2005).
    Page 5, “Features”
  7. Each feature consists of the conjunction of several attributes of the sentence, plus the named entity tags.
    Page 5, “Features”
  8. In preprocessing, consecutive words with the same named entity tag are ‘chunked’, so that Edwin/PERSON Hubble/PERSON becomes [Edwin Hubble] /PERSON.
    Page 6, “Implementation”

See all papers in Proc. ACL 2009 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

dependency path

Appears in 7 sentences as: dependency path (6) dependency paths (1)
In Distant supervision for relation extraction without labeled data
  1. For each sentence we extract a dependency path between each pair of entities.
    Page 5, “Features”
  2. A dependency path consists of a series of dependencies, directions and words/chunks representing a traversal of the parse.
    Page 5, “Features”
  3. Part-of-speech tags are not included in the dependency path .
    Page 5, “Features”
  4. o A dependency path between the two entities
    Page 5, “Features”
  5. o For each entity, one ‘window’ node that is not part of the dependency path
    Page 5, “Features”
  6. A window node is a node connected to one of the two entities and not part of the dependency path .
    Page 5, “Features”
  7. Sentences like this have very long (and thus rare) lexical features, but relatively short dependency paths .
    Page 8, “Discussion”

See all papers in Proc. ACL 2009 that mention dependency path.

See all papers in Proc. ACL that mention dependency path.

Back to top.

feature vector

Appears in 6 sentences as: feature vector (6)
In Distant supervision for relation extraction without labeled data
  1. For each pair of entities, we aggregate the features from the many different sentences in which that pair appeared into a single feature vector , allowing us to provide our classifier with more information, resulting in more accurate labels.
    Page 2, “Introduction”
  2. If a sentence contains two entities and those entities are an instance of one of our Freebase relations, features are extracted from that sentence and are added to the feature vector for the relation.
    Page 4, “Architecture”
  3. In training, the features for identical tuples (relation, entityl, entity2) from different sentences are combined, creating a richer feature vector .
    Page 4, “Architecture”
  4. This time, every pair of entities appearing together in a sentence is considered a potential relation instance, and whenever those entities appear together, features are extracted on the sentence and added to a feature vector for that entity pair.
    Page 4, “Architecture”
  5. Towards this end, we build a feature vector in the training phase for an ‘unrelated’ relation by randomly selecting entity pairs that do not appear in any Freebase relation and extracting features for them.
    Page 6, “Implementation”
  6. Our classifier takes as input an entity pair and a feature vector , and returns a relation name and a confidence score based on the probability of the entity pair belonging to that relation.
    Page 7, “Implementation”

See all papers in Proc. ACL 2009 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

relation extraction

Appears in 6 sentences as: relation extraction (6)
In Distant supervision for relation extraction without labeled data
  1. Modem models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora.
    Page 1, “Abstract”
  2. Supervised relation extraction suffers from a number of problems, however.
    Page 1, “Introduction”
  3. Our algorithm uses Freebase (Bollacker et al., 2008), a large semantic database, to provide distant supervision for relation extraction .
    Page 2, “Introduction”
  4. cal (word sequence) features in relation extraction .
    Page 2, “Introduction”
  5. Except for the unsupervised algorithms discussed above, previous supervised or bootstrapping approaches to relation extraction have typically relied on relatively small datasets, or on only a small number of distinct relations.
    Page 2, “Previous work”
  6. Many early algorithms for relation extraction used little or no syntactic information.
    Page 2, “Previous work”

See all papers in Proc. ACL 2009 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

dependency parse

Appears in 4 sentences as: dependency parse (2) dependency parsed (1) dependency parser (1)
In Distant supervision for relation extraction without labeled data
  1. In order to generate these features we parse each sentence with the broad-coverage dependency parser MIN IPAR (Lin, 1998).
    Page 5, “Features”
  2. A dependency parse consists of a set of words and chunks (e.g.
    Page 5, “Features”
  3. Each sentence of this unstructured text is dependency parsed by MINIPAR to produce a dependency graph.
    Page 6, “Implementation”
  4. This chunking is restricted by the dependency parse of the sentence, however, in that chunks must be contiguous in the parse (i.e., no chunks across subtrees).
    Page 6, “Implementation”

See all papers in Proc. ACL 2009 that mention dependency parse.

See all papers in Proc. ACL that mention dependency parse.

Back to top.

Part-of-speech

Appears in 4 sentences as: Part-of-speech (2) part-of-speech (2)
In Distant supervision for relation extraction without labeled data
  1. Hearst (1992) used a small number of regular expressions over words and part-of-speech tags to find examples of the hypernym relation.
    Page 2, “Previous work”
  2. such as Ravichandran and Hovy (2002) and Pantel and Pennacchiotti (2006) use the same formalism of learning regular expressions over words and part-of-speech tags to discover patterns indicating a variety of relations.
    Page 3, “Previous work”
  3. Part-of-speech tags were assigned by a maximum entropy tagger trained on the Penn Tree-bank, and then simplified into seven categories: nouns, verbs, adverbs, adjectives, numbers, foreign words, and everything else.
    Page 4, “Features”
  4. Part-of-speech tags are not included in the dependency path.
    Page 5, “Features”

See all papers in Proc. ACL 2009 that mention Part-of-speech.

See all papers in Proc. ACL that mention Part-of-speech.

Back to top.

Part-of-speech tags

Appears in 4 sentences as: Part-of-speech tags (2) part-of-speech tags (2)
In Distant supervision for relation extraction without labeled data
  1. Hearst (1992) used a small number of regular expressions over words and part-of-speech tags to find examples of the hypernym relation.
    Page 2, “Previous work”
  2. such as Ravichandran and Hovy (2002) and Pantel and Pennacchiotti (2006) use the same formalism of learning regular expressions over words and part-of-speech tags to discover patterns indicating a variety of relations.
    Page 3, “Previous work”
  3. Part-of-speech tags were assigned by a maximum entropy tagger trained on the Penn Tree-bank, and then simplified into seven categories: nouns, verbs, adverbs, adjectives, numbers, foreign words, and everything else.
    Page 4, “Features”
  4. Part-of-speech tags are not included in the dependency path.
    Page 5, “Features”

See all papers in Proc. ACL 2009 that mention Part-of-speech tags.

See all papers in Proc. ACL that mention Part-of-speech tags.

Back to top.

feature set

Appears in 3 sentences as: feature set (2) feature sets (1)
In Distant supervision for relation extraction without labeled data
  1. At most recall levels, the combination of syntactic and lexical features offers a substantial improvement in precision over either of these feature sets on its own.
    Page 7, “Evaluation”
  2. No feature set strongly outperforms any of the others across all relations.
    Page 8, “Evaluation”
  3. The held-out results in Figure 2 suggest that the combination of syntactic and lexical features provides better performance than either feature set on its own.
    Page 8, “Discussion”

See all papers in Proc. ACL 2009 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

hypernym

Appears in 3 sentences as: hypernym (3)
In Distant supervision for relation extraction without labeled data
  1. (2005) for exploiting WordNet to extract hypernym (isa) relations between entities, and is similar to the use of weakly labeled data in bioinfor-matics (Craven and Kumlien, 1999; Morgan et al.,
    Page 1, “Introduction”
  2. Approaches based on WordNet have often only looked at the hypernym (isa) or meronym (part-of) relation (Girju et al., 2003; Snow et al., 2005), while those based on the ACE program (Doddington et al., 2004) have been restricted in their evaluation to a small number of relation instances and corpora of less than a million words.
    Page 2, “Previous work”
  3. Hearst (1992) used a small number of regular expressions over words and part-of-speech tags to find examples of the hypernym relation.
    Page 2, “Previous work”

See all papers in Proc. ACL 2009 that mention hypernym.

See all papers in Proc. ACL that mention hypernym.

Back to top.

Mechanical Turk

Appears in 3 sentences as: Mechanical Turk (3)
In Distant supervision for relation extraction without labeled data
  1. Human evaluation was performed by evaluators on Amazon’s Mechanical Turk service, shown to be effective for natural language annotation in Snow et al.
    Page 7, “Evaluation”
  2. For each of the 10 relations that appeared most frequently in our test data (according to our classifier), we took samples from the first 100 and 1000 instances of this relation generated in each experiment, and sent these to Mechanical Turk for
    Page 7, “Evaluation”
  3. Each predicted relation instance was labeled as true or false by between 1 and 3 labelers on Mechanical Turk .
    Page 8, “Evaluation”

See all papers in Proc. ACL 2009 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.

regular expressions

Appears in 3 sentences as: regular expression (1) regular expressions (3)
In Distant supervision for relation extraction without labeled data
  1. For example, the DIPRE algorithm by Brin (1998) used string-based regular expressions in order to recognize relations such as author-book, while the SNOWBALL algorithm by Agichtein and Gravano (2000) learned similar regular expression patterns over words and named entity tags.
    Page 2, “Previous work”
  2. Hearst (1992) used a small number of regular expressions over words and part-of-speech tags to find examples of the hypernym relation.
    Page 2, “Previous work”
  3. such as Ravichandran and Hovy (2002) and Pantel and Pennacchiotti (2006) use the same formalism of learning regular expressions over words and part-of-speech tags to discover patterns indicating a variety of relations.
    Page 3, “Previous work”

See all papers in Proc. ACL 2009 that mention regular expressions.

See all papers in Proc. ACL that mention regular expressions.

Back to top.