SciSurf: Index of 'Refining Event Extraction through Cross-Document Inference'

Topics

coreference (6)
human annotators (6)
development set (5)
F-Measure (5)
coreference resolution (3)

Topics

coreference (6)
human annotators (6)
development set (5)
F-Measure (5)
coreference resolution (3)

Refining Event Extraction through Cross-Document Inference

Ji, Heng and Grishman, Ralph

Published in Proc. ACL, 2008

Article Structure

Abstract

We apply the hypothesis of “One Sense Per Discourse” (Yarowsky, 1995) to information extraction (IE), and extend the scope of “discourse” from one single document to a cluster of topically-related documents.

Introduction

Identifying events of a particular type within individual documents — ‘classical’ information extraction — remains a difficult task.

Task and Baseline System

2.1 ACE Event Extraction Task

Motivations

In this section we shall present our motivations based on error analysis for the baseline event tagger.

System Approach Overview

Based on the above motivations we propose to incorporate global evidence from a cluster of related documents to refine local decisions.

Global Inference

The central idea of inference is to obtain docu-ment-wide and cluster-wide statistics about the frequency with which triggers and arguments are associated with particular types of events, and then use this information to correct event and argument identification and classification.

Experimental Results and Analysis

In this section we present the results of applying this inference method to improve ACE event extraction.

Related Work

The trigger labeling task described in this paper is in part a task of word sense disambiguation (WSD), so we have used the idea of sense consistency introduced in (Yarowsky, 1995), extending it to operate across related documents.

Conclusion and Future Work

One of the initial goals for IE was to create a database of relations and events from the entire input corpus, and allow further logical reasoning on the database.

Topics

coreference

Appears in 6 sentences as: corefer (1) coreference (3) coreferential (3)

In Refining Event Extraction through Cross-Document Inference

2 In this paper we don’t consider event mention coreference resolution and so don’t distinguish event mentions and events.
Page 1, “Task and Baseline System”
For each argument we also add other names coreferential with or bearing some ACE relation to the argument.
Page 5, “System Approach Overview”
c for each event argument string and the names coreferential with or related to the argument, the frequency of the event type;
Page 5, “Global Inference”
c for each event argument string and the names coreferential with or related to the argument, the frequency of the event type and role.
Page 5, “Global Inference”
Almost all the current event extraction systems focus on processing single documents and, except for coreference resolution, operate a sentence at a time (Grishman et al., 2005; Ahn, 2006; Hardy et al., 2006).
Page 7, “Related Work”
The aggregation approach described here can be easily extended to improve relation detection and coreference resolution (two argument mentions referring to the same role of related events are likely to corefer ).
Page 7, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

human annotators

Appears in 6 sentences as: human annotation (1) human annotator (2) human annotators (3)

In Refining Event Extraction through Cross-Document Inference

In addition, we also measured the performance of two human annotators who prepared the ACE 2005 training data on 28 newswire texts (a subset of the blind test set).
Page 6, “Experimental Results and Analysis”
The improved trigger labeling is better than one human annotator and only 4.7% worse than another.
Page 6, “Experimental Results and Analysis”
This matches the situation of human annotation as well: we may decide whether a mention is involved in some particular event or not by reading and analyzing the target sentence itself; but in order to decide the argument’s role we may need to frequently refer to wider discourse in order to infer and confirm our decision.
Page 6, “Experimental Results and Analysis”
We should also note that human annotators label arguments based on perfect entity mentions, but our system used the output from the IE system.
Page 6, “Experimental Results and Analysis”
These are difficult tasks even for human annotators .
Page 6, “Experimental Results and Analysis”
In fact, compared to a statistical tagger trained on the corpus after expert adjudication, a human annotator tends to make more mistakes in trigger classification.
Page 7, “Experimental Results and Analysis”

See all papers in Proc. ACL 2008 that mention human annotators.

See all papers in Proc. ACL that mention human annotators.

Back to top.

development set

Appears in 5 sentences as: development set (5)

In Refining Event Extraction through Cross-Document Inference

We used 10 newswire texts from ACE 2005 training corpora (from March to May of 2003) as our development set , and then conduct blind test on a separate set of 40 ACE 2005 newswire texts.
Page 5, “Experimental Results and Analysis”
We select the thresholds (d with k=1~13) for various confidence metrics by optimizing the F-measure score of each rule on the development set , as shown in Figure 2 and 3 as follows.
Page 5, “Experimental Results and Analysis”
The labeled point on each curve shows the best F-measure that can be obtained on the development set by adjusting the threshold for that rule.
Page 6, “Experimental Results and Analysis”
This precision loss was much larger than that for the development set (0.3%).
Page 6, “Experimental Results and Analysis”
This indicates that the trigger propagation thresholds optimized on the development set were too low for the blind test set and thus more spurious triggers got propagated.
Page 6, “Experimental Results and Analysis”

See all papers in Proc. ACL 2008 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

F-Measure

Appears in 5 sentences as: F-Measure (4) F-measure (2)

In Refining Event Extraction through Cross-Document Inference

Without using any additional labeled data this new approach obtained 7.6% higher F-Measure in trigger labeling and 6% higher F-Measure in argument labeling over a state-of-the-art IE system which extracts events independently for each sentence.
Page 1, “Abstract”
We select the thresholds (d with k=1~13) for various confidence metrics by optimizing the F-measure score of each rule on the development set, as shown in Figure 2 and 3 as follows.
Page 5, “Experimental Results and Analysis”
The labeled point on each curve shows the best F-measure that can be obtained on the development set by adjusting the threshold for that rule.
Page 6, “Experimental Results and Analysis”
Table 5 shows the overall Precision (P), Recall (R) and F-Measure (F) scores for the blind test set.
Page 6, “Experimental Results and Analysis”
For argument labeling we can see that cross-sentence inference improved both identification (3.7% higher F-Measure ) and classification (6.1% higher accuracy); and cross-document inference mainly provided further gains (1.9%) in classification.
Page 6, “Experimental Results and Analysis”

See all papers in Proc. ACL 2008 that mention F-Measure.

See all papers in Proc. ACL that mention F-Measure.

Back to top.

coreference resolution

Appears in 3 sentences as: coreference resolution (3)

In Refining Event Extraction through Cross-Document Inference

2 In this paper we don’t consider event mention coreference resolution and so don’t distinguish event mentions and events.
Page 1, “Task and Baseline System”
Almost all the current event extraction systems focus on processing single documents and, except for coreference resolution , operate a sentence at a time (Grishman et al., 2005; Ahn, 2006; Hardy et al., 2006).
Page 7, “Related Work”
The aggregation approach described here can be easily extended to improve relation detection and coreference resolution (two argument mentions referring to the same role of related events are likely to corefer).
Page 7, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention coreference resolution.

See all papers in Proc. ACL that mention coreference resolution.

Back to top.