Template-Based Information Extraction without the Templates
Chambers, Nathanael and Jurafsky, Dan

Article Structure

Abstract

Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template).

Introduction

A template defines a specific type of event (e.g., a bombing) with a set of semantic roles (or slots) for the typical entities involved in such an event (e.g., perpetrator, target, instrument).

Previous Work

Many template extraction algorithms require full knowledge of the templates and labeled corpora, such as in rule-based systems (Chinchor et al., 1993; Rau et al., 1992) and modern supervised classifiers (Freitag, 1998; Chieu et al., 2003; Bunescu and Mooney, 2004; Patwardhan and Riloff, 2009).

The Domain and its Templates

Our goal is to learn the general event structure of a domain, and then extract the instances of each learned event.

Learning Templates from Raw Text

Our goal is to learn templates that characterize a domain as described in unclustered, unlabeled documents.

Information Extraction: Slot Filling

We now present how to apply our learned templates to information extraction.

Standard Evaluation

We trained on the 1300 documents in the MUC-4 corpus and tested on the 200 document TST3 and TST4 test set.

Specific Evaluation

In order to more precisely evaluate each learned template, we also evaluated per-template performance.

Discussion

Template-based IE systems typically assume knowledge of the domain and its templates.

Topics

semantic roles

Appears in 19 sentences as: semantic role (4) Semantic Roles (1) semantic roles (14)
In Template-Based Information Extraction without the Templates
  1. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set ofi, and destroy events) associated with semantic roles .
    Page 1, “Abstract”
  2. A template defines a specific type of event (e.g., a bombing) with a set of semantic roles (or slots) for the typical entities involved in such an event (e.g., perpetrator, target, instrument).
    Page 1, “Introduction”
  3. Our approach learns narrative-like knowledge in the form of IE templates; we learn sets of related events and semantic roles , as shown in this sample output from our system:
    Page 1, “Introduction”
  4. A semantic role , such as target, is a cluster of syntactic functions of the template’s event words (e.g., the objects of detonate and explode).
    Page 1, “Introduction”
  5. We describe how to first expand the small corpus’ size, how to cluster its events, and finally how to induce semantic roles .
    Page 2, “Introduction”
  6. (2006) integrate named entities into pattern learning (PERSON won) to approximate unknown semantic roles .
    Page 2, “Previous Work”
  7. Our approach draws on this idea of using unlabeled documents to discover relations in text, and of defining semantic roles by sets of entities.
    Page 2, “Previous Work”
  8. Scripts are sets of related event words and semantic roles learned by linking syntactic functions with coreferring arguments.
    Page 2, “Previous Work”
  9. The previous section clustered events from the MUC-4 corpus, but its 1300 documents do not provide enough examples of verbs and argument counts to further learn the semantic roles in each
    Page 4, “Learning Templates from Raw Text”
  10. 4.3 Inducing Semantic Roles (Slots)
    Page 4, “Learning Templates from Raw Text”
  11. Having successfully clustered event words and retrieved an IR-corpas for each cluster, we now address the problem of inducing semantic roles .
    Page 4, “Learning Templates from Raw Text”

See all papers in Proc. ACL 2011 that mention semantic roles.

See all papers in Proc. ACL that mention semantic roles.

Back to top.

F1 score

Appears in 6 sentences as: F1 Score (1) F1 score (5)
In Template-Based Information Extraction without the Templates
  1. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to hand—created gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
    Page 1, “Abstract”
  2. The bombing template performs best with an F1 score of .72.
    Page 7, “Information Extraction: Slot Filling”
  3. The standard evaluation for this corpus is to report the F1 score for slot type accuracy, ignoring the template type.
    Page 8, “Standard Evaluation”
  4. F1 Score Kidnap Bomb Arson Attack Results .53 .43 .42 .16 / .25
    Page 8, “Standard Evaluation”
  5. Kidnap improves most significantly in F1 score (7 Fl points absolute), but the others only change slightly.
    Page 9, “Specific Evaluation”
  6. We achieved results with comparable precision, and an F1 score of .40 that approaches prior algorithms that rely on handcrafted knowledge.
    Page 9, “Discussion”

See all papers in Proc. ACL 2011 that mention F1 score.

See all papers in Proc. ACL that mention F1 score.

Back to top.

coreference

Appears in 5 sentences as: coref (1) corefer (1) Coreference (1) coreference (2)
In Template-Based Information Extraction without the Templates
  1. This paper extends this intuition by introducing a new vector-based approach to coreference similarity.
    Page 5, “Learning Templates from Raw Text”
  2. In the sentence, he ran and then he fell, the subjects of run and fall corefer , and so they likely belong to the same scenario-specific semantic role.
    Page 5, “Learning Templates from Raw Text”
  3. For instance, arguments of the relation go_ofi”:s were seen coreferring with mentions in plant:o, set_ofi”:o and injures We represent go_ofi”:s as a vector of these relation counts, calling this its coref vector representation.
    Page 5, “Learning Templates from Raw Text”
  4. coreference and SPs measure different types of similarity.
    Page 5, “Learning Templates from Raw Text”
  5. Coreference is a looser narrative similarity (bombings cause injuries), while SPs capture synonymy (plant and place have similar arguments).
    Page 5, “Learning Templates from Raw Text”

See all papers in Proc. ACL 2011 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

LDA

Appears in 5 sentences as: LDA (5)
In Template-Based Information Extraction without the Templates
  1. We consider two unsupervised algorithms: Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), and agglomerative clustering based on word distance.
    Page 3, “Learning Templates from Raw Text”
  2. 4.1.1 LDA for Unknown Data
    Page 3, “Learning Templates from Raw Text”
  3. LDA is a probabilistic model that treats documents as mixtures of topics.
    Page 3, “Learning Templates from Raw Text”
  4. Our best performing LDA model used 200 topics.
    Page 3, “Learning Templates from Raw Text”
  5. We had mixed success with LDA though, and ultimately found our next approach performed slightly better on the document classification evaluation.
    Page 3, “Learning Templates from Raw Text”

See all papers in Proc. ACL 2011 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

named entities

Appears in 5 sentences as: named entities (3) named entity (2)
In Template-Based Information Extraction without the Templates
  1. Classifiers rely on the labeled examples’ surrounding context for features such as nearby tokens, document position, syntax, named entities , semantic classes, and discourse relations (Maslennikov and Chua, 2007).
    Page 2, “Previous Work”
  2. (2006) integrate named entities into pattern learning (PERSON won) to approximate unknown semantic roles.
    Page 2, “Previous Work”
  3. However, the limitations to their approach are that (l) redundant documents about specific events are required, (2) relations are binary, and (3) only slots with named entities are learned.
    Page 2, “Previous Work”
  4. We also tag the corpus with an NER system and allow patterns to include named entity types, e.g., ‘kidnaszERSON’.
    Page 3, “Learning Templates from Raw Text”
  5. Finally, we filter extractions whose WordNet or named entity label does not match the learned slot’s type (e.g., a Location does not match a Person).
    Page 8, “Information Extraction: Slot Filling”

See all papers in Proc. ACL 2011 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

labeled data

Appears in 4 sentences as: labeled data (4)
In Template-Based Information Extraction without the Templates
  1. Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data , to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template).
    Page 1, “Abstract”
  2. Weakly supervised approaches remove some of the need for fully labeled data .
    Page 2, “Previous Work”
  3. Shinyama and Sekine (2006) describe an approach to template learning without labeled data .
    Page 2, “Previous Work”
  4. Our precision is as good as (and our Fl score near) two algorithms that require knowledge of the templates and/or labeled data .
    Page 8, “Standard Evaluation”

See all papers in Proc. ACL 2011 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

conditional probability

Appears in 3 sentences as: conditional probability (3)
In Template-Based Information Extraction without the Templates
  1. A document is labeled for a template if two different conditions are met: (1) it contains at least one trigger phrase, and (2) its average per-token conditional probability meets a strict threshold.
    Page 7, “Information Extraction: Slot Filling”
  2. Both conditions require a definition of the conditional probability of a template given a token.
    Page 7, “Information Extraction: Slot Filling”
  3. This is not the usual conditional probability definition as IR-corpora are different sizes.
    Page 7, “Information Extraction: Slot Filling”

See all papers in Proc. ACL 2011 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

cosine similarity

Appears in 3 sentences as: cosine similarity (3)
In Template-Based Information Extraction without the Templates
  1. Vector-based approaches are often adopted to represent words as feature vectors and compute their distance with cosine similarity .
    Page 3, “Learning Templates from Raw Text”
  2. Distance is the cosine similarity between bag-of-words vector representations.
    Page 4, “Learning Templates from Raw Text”
  3. We measure similarity using cosine similarity between the vectors in both approaches.
    Page 5, “Learning Templates from Raw Text”

See all papers in Proc. ACL 2011 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.