PARMA: A Predicate Argument Aligner
Wolfe, Travis and Van Durme, Benjamin and Dredze, Mark and Andrews, Nicholas and Beller, Charley and Callison-Burch, Chris and DeYoung, Jay and Snyder, Justin and Weese, Jonathan and Xu, Tan and Yao, Xuchen

Article Structure

Abstract

We introduce PARMA, a system for cross-document, semantic predicate and argument alignment.

Introduction

A key step of the information extraction pipeline is entity disambiguation, in which discovered entities across many sentences and documents must be organized to represent real world entities.

PARMA

PARMA (Predicate ARguMent Aligner) is a pipelined system with a wide variety of features used to align predicates and arguments in two documents.

Evaluation

We consider three datasets for evaluating PARMA.

Results

On every dataset PARMA significantly improves over the lemma baselines (Table 1).

Conclusion

PARMA achieves state of the art performance on three datasets for predicate argument alignment.

Topics

coreference

Appears in 10 sentences as: coref (2) corefer (1) Coreference (3) coreference (6) coreferent (1)
In PARMA: A Predicate Argument Aligner
  1. Similar to entity coreference resolution, almost all of this work assumes unanchored mentions: predicate argument tuples are grouped together based on coreferent events.
    Page 1, “Introduction”
  2. The first work on event coreference dates back to Bagga and Baldwin (1999).
    Page 1, “Introduction”
  3. Predicates are represented as mention spans and arguments are represented as coreference chains (sets of mention spans) provided by in-document coreference resolution systems such as included in the Stanford NLP toolkit.
    Page 1, “PARMA”
  4. For argument coref chains we heuristically choose a canonical mention to represent each chain, and some features only look at this canonical mention.
    Page 2, “PARMA”
  5. The canonical mention is chosen based on length,4 information about the head word,5 and position in the document.6 In most cases, coref chains that are longer than one are proper nouns and the canonical mention is the first and longest mention (outranking pronominal references and other name shortenings).
    Page 2, “PARMA”
  6. For richer annotations that include lemmatiza-tions, part of speech, NER, and in-doc coreference , we preprocessed each of the datasets using tools7 similar to those used to create the Annotated Gigaword corpus (Napoles et al., 2012).
    Page 3, “Evaluation”
  7. Extended Event Coreference Bank Based on the dataset of Bejan and Harabagiu (2010), Lee et al.
    Page 3, “Evaluation”
  8. (2012) introduced the Extended Event Coreference Bank (EECB) to evaluate cross-document event coreference .
    Page 3, “Evaluation”
  9. EECB provides document clusters, within which entities and events may corefer .
    Page 3, “Evaluation”
  10. Table l: PARMA outperforms the baseline lemma matching system on the three test sets, drawn from the Extended Event Coreference Bank, Roth and Frank’s data, and our set created from the Multiple Translation Corpora.
    Page 5, “Evaluation”

See all papers in Proc. ACL 2013 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

cosine similarity

Appears in 4 sentences as: Cosine Similarity (1) cosine similarity (3)
In PARMA: A Predicate Argument Aligner
  1. The data was generated by clustering similar news stories from Gigaword using TF-IDF cosine similarity of their headlines.
    Page 4, “Evaluation”
  2. Doc-pair Cosine Similarity
    Page 4, “Evaluation”
  3. The x-axis represents the cosine similarity between the document pairs.
    Page 4, “Evaluation”
  4. Additionally, there is more data in the MTC dataset which has low cosine similarity than in RF.
    Page 5, “Results”

See all papers in Proc. ACL 2013 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

state of the art

Appears in 4 sentences as: state of the art (4)
In PARMA: A Predicate Argument Aligner
  1. PARMA achieves state of the art results on an existing and a new dataset.
    Page 1, “Abstract”
  2. We demonstrate the effectiveness of this approach by achieving state of the art performance on the data of Roth and Frank despite having little relevant training data.
    Page 1, “Introduction”
  3. On RF, compared to Roth and Frank, the best published method for this task, we also improve, making PARMA the state of the art system for this task.
    Page 5, “Results”
  4. PARMA achieves state of the art performance on three datasets for predicate argument alignment.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2013 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.

lexical semantic

Appears in 3 sentences as: lexical semantic (3)
In PARMA: A Predicate Argument Aligner
  1. As opposed to Roth and Frank, PARMA is designed as a a trainable platform for the incorporation of the sort of lexical semantic resources used in the related areas of Recognizing Textual Entailment (RTE) and Question Answering (QA).
    Page 1, “Introduction”
  2. The focus of PARMA is the integration of a diverse range of features based on existing lexical semantic resources.
    Page 2, “PARMA”
  3. It builds on the development of lexical semantic resources and provides a platform for learning to utilize these resources.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2013 that mention lexical semantic.

See all papers in Proc. ACL that mention lexical semantic.

Back to top.

WordNet

Appears in 3 sentences as: WordNet (4)
In PARMA: A Predicate Argument Aligner
  1. WordNet WordNet (Miller, 1995) is a database of information (synonyms, hypernyms, etc.)
    Page 3, “PARMA”
  2. For each entry, WordNet provides a set of synonyms, hypernyms, etc.
    Page 3, “PARMA”
  3. Given two spans, we use WordNet to determine semantic similarity by measuring how many synonym (or other) edges are needed to link two
    Page 3, “PARMA”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.