Extracting Narrative Timelines as Temporal Dependency Structures
Kolomiyets, Oleksandr and Bethard, Steven and Moens, Marie-Francine

Article Structure

Abstract

We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY.

Introduction

There has been much recent interest in identifying events, times and their relations along the timeline, from event and time ordering problems in the TempEval shared tasks (Verhagen et al., 2007; Verhagen et al., 2010), to identifying time arguments of event structures in the Automated Content Extraction program (Linguistic Data Consortium, 2005; Gupta and J i, 2009), to timestamping event intervals in the Knowledge Base Population shared task (Artiles et al., 2011; Amigo et al., 2011).

Related Work

Much prior work on the annotation of temporal information has constructed corpora with incomplete timelines.

Corpus Annotation

The corpus of stories for children was drawn from the fables collection of (McIntyre and Lapata, 2009)1 and annotated as described in (Bethard et al., 2012).

Parsing Models

We consider two different approaches to learning a temporal dependency parser: a shift-reduce model (Nivre, 2008) and a graph-based model (McDonald et al., 2005).

Feature Design

The proposed parsing algorithms both rely on machine learning methods.

Evaluations

Evaluations were performed using 10-fold cross-validation on the fables annotated in Section 3.

Discussion and Conclusions

In this article, we have presented an approach to temporal information extraction that represents the time-

Topics

dependency parsing

Appears in 25 sentences as: dependency parser (5) dependency parsers (1) dependency parsing (19)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
    Page 1, “Abstract”
  2. Our analysis of the dependency parser errors gives some insights into future research directions.
    Page 1, “Abstract”
  3. We then demonstrate how to train a timeline extraction system based on dependency parsing techniques instead of the pairWise classification approaches typical of prior work.
    Page 1, “Introduction”
  4. 0 We design a non-projective dependency parser for inferring timelines from text.
    Page 1, “Introduction”
  5. The following sections first review some relevant prior work, then describe the corpus annotation and the dependency parsing algorithm, and finally present our evaluation results.
    Page 1, “Introduction”
  6. improving the annotation approach to generate the fully connected timeline of a story, and improving the models for timeline extraction using dependency parsing techniques.
    Page 2, “Related Work”
  7. We employ methods from syntactic dependency parsing , adapting them to our task by including features typical of temporal relation labeling models.
    Page 2, “Related Work”
  8. train a temporal dependency parsing model.
    Page 3, “Corpus Annotation”
  9. We consider two different approaches to learning a temporal dependency parser : a shift-reduce model (Nivre, 2008) and a graph-based model (McDonald et al., 2005).
    Page 3, “Parsing Models”
  10. Shift-reduce dependency parsers start with an input queue of unlinked words, and link them into a tree by repeatedly choosing and performing actions like shifting a node to a stack, or popping two nodes from the stack and linking them.
    Page 4, “Parsing Models”
  11. Formally, a deterministic shift-reduce dependency parser is defined as (C, T, CF, INIT, TREE) where:
    Page 4, “Parsing Models”

See all papers in Proc. ACL 2012 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

shift-reduce

Appears in 18 sentences as: Shift-Reduce (3) Shift-reduce (2) shift-reduce (13)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We consider two different approaches to learning a temporal dependency parser: a shift-reduce model (Nivre, 2008) and a graph-based model (McDonald et al., 2005).
    Page 3, “Parsing Models”
  2. 4.1 Shift-Reduce Parsing Model
    Page 4, “Parsing Models”
  3. Shift-reduce dependency parsers start with an input queue of unlinked words, and link them into a tree by repeatedly choosing and performing actions like shifting a node to a stack, or popping two nodes from the stack and linking them.
    Page 4, “Parsing Models”
  4. Shift-reduce parsers are typically defined in terms of configurations and a transition system, where the configurations describe the current internal state of the parser, and the transition system describes how to get from one state to another.
    Page 4, “Parsing Models”
  5. Formally, a deterministic shift-reduce dependency parser is defined as (C, T, CF, INIT, TREE) where:
    Page 4, “Parsing Models”
  6. One shortcoming of the shift-reduce dependency parsing approach is that each transition decision
    Page 4, “Parsing Models”
  7. The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk.
    Page 5, “Feature Design”
  8. Note that both models share features that look at the nodes, while only the shift-reduce parser has features for previously classified edges.
    Page 5, “Feature Design”
  9. Table 2: Features for the shift-reduce parser (SRP) and the graph-based maximum spanning tree (MST) parser.
    Page 6, “Evaluations”
  10. The Shift-Reduce parser (SRP; Section 4.1) and the graph-based, maximum spanning tree parser (MST; Section 4.2) are compared to these baselines.
    Page 6, “Evaluations”
  11. In terms of labeled attachment score, both dependency parsing models outperformed the baseline models — the maximum spanning tree parser achieved 0.614 LAS, and the shift-reduce parser achieved 0.647 LAS.
    Page 7, “Evaluations”

See all papers in Proc. ACL 2012 that mention shift-reduce.

See all papers in Proc. ACL that mention shift-reduce.

Back to top.

parsing models

Appears in 14 sentences as: Parsing Model (2) parsing model (5) parsing models (7)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
    Page 1, “Abstract”
  2. train a temporal dependency parsing model .
    Page 3, “Corpus Annotation”
  3. Formally, a parsing model is a function (W —> H) where W = 701702 .
    Page 3, “Parsing Models”
  4. 4.1 Shift-Reduce Parsing Model
    Page 4, “Parsing Models”
  5. 4.2 Graph-Based Parsing Model
    Page 4, “Parsing Models”
  6. Graph-based models are an alternative dependency parsing model , which assembles a graph with weighted edges between all pairs of words, and selects the tree-shaped subset of this graph that gives the highest total score (Fig.
    Page 5, “Parsing Models”
  7. The full set of features proposed for both parsing models , derived from the state-of-the-art systems for temporal relation labeling, is presented in Table 2.
    Page 5, “Feature Design”
  8. To evaluate the parsing models (SRP and MST) we proposed two baselines.
    Page 6, “Evaluations”
  9. In terms of labeled attachment score, both dependency parsing models outperformed the baseline models — the maximum spanning tree parser achieved 0.614 LAS, and the shift-reduce parser achieved 0.647 LAS.
    Page 7, “Evaluations”
  10. These results indicate that dependency parsing models are a good fit to our whole-story timeline extraction task.
    Page 7, “Evaluations”
  11. Finally, in comparing the two different dependency parsing models , we observe that the shift-reduce parser outperforms the maximum spanning
    Page 7, “Evaluations”

See all papers in Proc. ACL 2012 that mention parsing models.

See all papers in Proc. ACL that mention parsing models.

Back to top.

dependency tree

Appears in 12 sentences as: dependency tree (6) dependency trees (6)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We annotate a corpus of children’s stories with temporal dependency trees , achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels.
    Page 1, “Abstract”
  2. The temporal language in a text often fails to specify a total ordering over all the events, so we annotate the timelines as temporal dependency structures, where each event is a node in the dependency tree , and each edge between nodes represents a temporal ordering relation such as BEFORE, AFTER, OVERLAP 01‘ IDENTITY.
    Page 1, “Introduction”
  3. W6 construct an evaluation corpus by annotating such temporal dependency trees over a set of children’s stories.
    Page 1, “Introduction”
  4. 0 We propose a new approach to characterizing temporal structure via dependency trees .
    Page 1, “Introduction”
  5. 0 We produce an annotated corpus of temporal dependency trees in children’s stories.
    Page 1, “Introduction”
  6. .70” is a sequence of event words, and 7r 6 H is a dependency tree 7r 2 (V, E) where:
    Page 3, “Parsing Models”
  7. o TREE E (OF —> H) is a function that extracts a dependency tree 7r from a final parser state CF
    Page 4, “Parsing Models”
  8. o c 2 (L1, L2, Q, E) is a parser configuration, where L1 and L2 are lists for temporary storage, Q is the queue of input words, and E is the set of identified edges of the dependency tree .
    Page 4, “Parsing Models”
  9. o TREE((L1, L2, Q,E)) = (WU{R00t},E) extracts the final dependency tree .
    Page 4, “Parsing Models”
  10. For temporal dependency trees , we assume each operation costs 1.0.
    Page 7, “Evaluations”
  11. It has been argued that graph-based models like the maximum spanning tree parser should be able to produce more globally consistent and correct dependency trees , yet we do not observe that here.
    Page 7, “Evaluations”

See all papers in Proc. ACL 2012 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

graph-based

Appears in 11 sentences as: Graph-Based (1) Graph-based (2) graph-based (8)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
    Page 1, “Abstract”
  2. We consider two different approaches to learning a temporal dependency parser: a shift-reduce model (Nivre, 2008) and a graph-based model (McDonald et al., 2005).
    Page 3, “Parsing Models”
  3. 4.2 Graph-Based Parsing Model
    Page 4, “Parsing Models”
  4. Graph-based models are an alternative dependency parsing model, which assembles a graph with weighted edges between all pairs of words, and selects the tree-shaped subset of this graph that gives the highest total score (Fig.
    Page 5, “Parsing Models”
  5. Formally, a graph-based parser follows Algorithm 2, where:
    Page 5, “Parsing Models”
  6. Algorithm 2 Graph-based dependency parsing E <— {(e, SCORE(e)) : e E (W’XRXW))} G <— (W’, E) return SPANNINGTREE(G)
    Page 5, “Parsing Models”
  7. The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk.
    Page 5, “Feature Design”
  8. Table 2: Features for the shift-reduce parser (SRP) and the graph-based maximum spanning tree (MST) parser.
    Page 6, “Evaluations”
  9. The Shift-Reduce parser (SRP; Section 4.1) and the graph-based , maximum spanning tree parser (MST; Section 4.2) are compared to these baselines.
    Page 6, “Evaluations”
  10. It has been argued that graph-based models like the maximum spanning tree parser should be able to produce more globally consistent and correct dependency trees, yet we do not observe that here.
    Page 7, “Evaluations”
  11. Comparing the two dependency parsing models, we have found that a shift-reduce parser, which more closely mirrors the incremental processing of our human annotators, outperforms a graph-based maximum spanning tree parser.
    Page 8, “Discussion and Conclusions”

See all papers in Proc. ACL 2012 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

maximum spanning

Appears in 11 sentences as: maximum spanning (11)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
    Page 1, “Abstract”
  2. The SPANNINGTREE function is usually defined using one of the efficient search techniques for finding a maximum spanning tree.
    Page 5, “Parsing Models”
  3. The result is the globally optimal maximum spanning tree for the graph (Georgiadis, 2003).
    Page 5, “Parsing Models”
  4. The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk.
    Page 5, “Feature Design”
  5. Table 2: Features for the shift-reduce parser (SRP) and the graph-based maximum spanning tree (MST) parser.
    Page 6, “Evaluations”
  6. The Shift-Reduce parser (SRP; Section 4.1) and the graph-based, maximum spanning tree parser (MST; Section 4.2) are compared to these baselines.
    Page 6, “Evaluations”
  7. In terms of labeled attachment score, both dependency parsing models outperformed the baseline models — the maximum spanning tree parser achieved 0.614 LAS, and the shift-reduce parser achieved 0.647 LAS.
    Page 7, “Evaluations”
  8. Finally, in comparing the two different dependency parsing models, we observe that the shift-reduce parser outperforms the maximum spanning
    Page 7, “Evaluations”
  9. It has been argued that graph-based models like the maximum spanning tree parser should be able to produce more globally consistent and correct dependency trees, yet we do not observe that here.
    Page 7, “Evaluations”
  10. A likely explanation for this phenomenon is that the shift-reduce parsing model allows for features describing previous parse decisions (similar to the incremental nature of human parse decisions), while the joint nature of the maximum spanning tree parser does not.
    Page 7, “Evaluations”
  11. Comparing the two dependency parsing models, we have found that a shift-reduce parser, which more closely mirrors the incremental processing of our human annotators, outperforms a graph-based maximum spanning tree parser.
    Page 8, “Discussion and Conclusions”

See all papers in Proc. ACL 2012 that mention maximum spanning.

See all papers in Proc. ACL that mention maximum spanning.

Back to top.

spanning tree

Appears in 10 sentences as: spanning tree (10)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
    Page 1, “Abstract”
  2. The SPANNINGTREE function is usually defined using one of the efficient search techniques for finding a maximum spanning tree .
    Page 5, “Parsing Models”
  3. The result is the globally optimal maximum spanning tree for the graph (Georgiadis, 2003).
    Page 5, “Parsing Models”
  4. The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk.
    Page 5, “Feature Design”
  5. Table 2: Features for the shift-reduce parser (SRP) and the graph-based maximum spanning tree (MST) parser.
    Page 6, “Evaluations”
  6. The Shift-Reduce parser (SRP; Section 4.1) and the graph-based, maximum spanning tree parser (MST; Section 4.2) are compared to these baselines.
    Page 6, “Evaluations”
  7. In terms of labeled attachment score, both dependency parsing models outperformed the baseline models — the maximum spanning tree parser achieved 0.614 LAS, and the shift-reduce parser achieved 0.647 LAS.
    Page 7, “Evaluations”
  8. It has been argued that graph-based models like the maximum spanning tree parser should be able to produce more globally consistent and correct dependency trees, yet we do not observe that here.
    Page 7, “Evaluations”
  9. A likely explanation for this phenomenon is that the shift-reduce parsing model allows for features describing previous parse decisions (similar to the incremental nature of human parse decisions), while the joint nature of the maximum spanning tree parser does not.
    Page 7, “Evaluations”
  10. Comparing the two dependency parsing models, we have found that a shift-reduce parser, which more closely mirrors the incremental processing of our human annotators, outperforms a graph-based maximum spanning tree parser.
    Page 8, “Discussion and Conclusions”

See all papers in Proc. ACL 2012 that mention spanning tree.

See all papers in Proc. ACL that mention spanning tree.

Back to top.

edit distance

Appears in 7 sentences as: Edit Distance (1) edit distance (8)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596.
    Page 1, “Abstract”
  2. Tree Edit Distance In addition to the UAS and LAS the tree edit distance score has been recently introduced for evaluating dependency structures (Tsarfaty et al., 2011).
    Page 6, “Evaluations”
  3. The tree edit distance score for a tree 7r is based on the following operations A E A : A = {DELETE, INSERT, RELABEL}:
    Page 6, “Evaluations”
  4. Taking the shortest such sequence, the tree edit distance is calculated as the sum of the edit operation costs divided by the size of the tree (i.e.
    Page 7, “Evaluations”
  5. The final score subtracts the edit distance from 1 so that a perfect tree has score 1.0.
    Page 7, “Evaluations”
  6. The labeled tree edit distance score (LTEDS) calculates sequences over the tree with all its labeled temporal relations, while the unlabeled tree edit distance score (UTEDS) treats all edges as if they had the same label.
    Page 7, “Evaluations”
  7. The shift-reduce parser also outperformed the baseline models in terms of labeled tree edit distance , achieving 0.596 LTEDS vs. the baseline 0.549 LTEDS.
    Page 7, “Evaluations”

See all papers in Proc. ACL 2012 that mention edit distance.

See all papers in Proc. ACL that mention edit distance.

Back to top.

machine learning

Appears in 4 sentences as: machine learning (5)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. The oracle 0 is typically defined as a machine learning classifier, which characterizes a parser configuration c in terms of a set of features.
    Page 4, “Parsing Models”
  2. The SCORE function is typically defined as a machine learning model that scores an edge based on a set of features.
    Page 5, “Parsing Models”
  3. The proposed parsing algorithms both rely on machine learning methods.
    Page 5, “Feature Design”
  4. The shift-reduce parser (SRP) trains a machine learning classifier as the oracle 0 E (C —> T) to predict a transition 75 from a parser configuration 0 2 (L1, L2, Q, E), using node features such as the heads of L1, L2 and Q, and edge features from the already predicted temporal relations in E. The graph-based maximum spanning tree (MST) parser trains a machine learning model to predict SCORE(e) for an edge e = (107;, rj, wk), using features of the nodes w, and wk.
    Page 5, “Feature Design”

See all papers in Proc. ACL 2012 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

UAS

Appears in 3 sentences as: UAS (3)
In Extracting Narrative Timelines as Temporal Dependency Structures
  1. Unlabeled Attachment Score ( UAS ) The fraction of events whose head events were correctly predicted.
    Page 6, “Evaluations”
  2. Tree Edit Distance In addition to the UAS and LAS the tree edit distance score has been recently introduced for evaluating dependency structures (Tsarfaty et al., 2011).
    Page 6, “Evaluations”
  3. UAS LAS UTEDS LTEDS LinearSeq 0.830 0.581 0.689 0.549 ClassifySeq 0.830 0.581 0.689 0.549 MST 0.837 0.614* 0.710 0.571 SRP 0.830 0.647*Jr 0.712 0.596*
    Page 7, “Evaluations”

See all papers in Proc. ACL 2012 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.