Detecting Errors in Automatically-Parsed Dependency Relations
Dickinson, Markus

Article Structure

Abstract

We outline different methods to detect errors in automatically-parsed dependency corpora, by comparing so-called dependency rules to their representation in the training data and flagging anomalous ones.

Introduction and Motivation

Given the need for high-quality dependency parses in applications such as statistical machine translation (Xu et al., 2009), natural language generation (Wan et al., 2009), and text summarization evaluation (Owczarzak, 2009), there is a corresponding need for high-quality dependency annotation, for the training and evaluation of dependency parsers (Buchholz and Marsi, 2006).

Approach

We take as a starting point two methods for detecting ad hoc rules in constituency annotation (Dickinson, 2008).

Ad hoc rule detection

3.1 An appropriate representation

Additional information

The methods presented so far have limited definitions of comparability.

Evaluation

In evaluating the methods, our main question is: how accurate are the dependencies, in terms of both attachment and labeling?

Summary and Outlook

We have proposed different methods for flagging the errors in automatically-parsed corpora, by treating the problem as one of looking for anomalous rules with respect to a treebank grammar.

Topics

bigram

Appears in 19 sentences as: Bigram (1) bigram (16) bigrams (5)
In Detecting Errors in Automatically-Parsed Dependency Relations
  1. We propose to flag erroneous parse rules, using information which reflects different grammatical properties: POS lookup, bigram information, and full rule comparisons.
    Page 2, “Introduction and Motivation”
  2. First, the bigram method abstracts a rule to its bigrams .
    Page 2, “Approach”
  3. 3.4 Bigram anomalies 3.4.1 Motivation
    Page 4, “Ad hoc rule detection”
  4. The bigram method examines relationships between adjacent sisters, complementing the whole rule method by focusing on local properties.
    Page 4, “Ad hoc rule detection”
  5. But only the final elements have anomalous bigrams : HD:ID IR:IR, IR:IR ANzRO, and ANzRO J RzIR all never occur.
    Page 4, “Ad hoc rule detection”
  6. To obtain a bigram score for an element, we simply add together the bigrams which contain the element in question, as in (7).
    Page 4, “Ad hoc rule detection”
  7. With 0 2 TA, the bigram HD:ID IR:IR never occurs, so both HD:ID and IR:IR get 0 added to their score.
    Page 4, “Ad hoc rule detection”
  8. HD:ID HD:ID, however, is a frequent bigram, so it adds weight to HD:ID, i.e., positive evidence comes from the bigram on the left.
    Page 4, “Ad hoc rule detection”
  9. This rule is entirely correct, yet the XXzXX position has low whole rule and bigram scores.
    Page 4, “Additional information”
  10. For example, the bigram method with a threshold of 39 leads to finding 283 errors (455 x .622).
    Page 6, “Evaluation”
  11. The whole rule and bigram methods reveal greater precision in identifying problematic dependencies, isolating elements with lower UAS and LAS scores than with frequency, along with corresponding greater pre-
    Page 6, “Evaluation”

See all papers in Proc. ACL 2010 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

dependency relation

Appears in 8 sentences as: dependency relation (6) dependency relations (2)
In Detecting Errors in Automatically-Parsed Dependency Relations
  1. On a par with constituency rules, we define a grammar rule as a dependency relation rewriting as a head with its sequence of POS/dependent pairs (cf.
    Page 2, “Ad hoc rule detection”
  2. Units of comparison To determine similarity, one can compare dependency relations , POS tags, or both.
    Page 3, “Ad hoc rule detection”
  3. Thus, we use the pairs of dependency relations and POS tags as the units of comparison.
    Page 3, “Ad hoc rule detection”
  4. Comparability could be defined in terms of a rule’s dependency relation (LHS) or in terms of its head.
    Page 3, “Ad hoc rule detection”
  5. Our approach is thus to take the greater value of scores when comparing to rules either with the same dependency relation or with the same head.
    Page 3, “Ad hoc rule detection”
  6. We extract POS pairs, note their dependency relation , and add a LR to the label to indicate which is the head (Boyd et al., 2008).
    Page 5, “Additional information”
  7. We can measure this by scoring each testing data position below the threshold as a 1 if it has the correct head and dependency relation and a 0 otherwise.
    Page 5, “Evaluation”
  8. For example, the parsed rule TA —> IG:IG RO has a correct dependency relation (IG) between the POS tags IG and its head RO, yet is assigned a whole rule score of 2 and a bigram score of 20.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2010 that mention dependency relation.

See all papers in Proc. ACL that mention dependency relation.

Back to top.

POS tags

Appears in 7 sentences as: POS tag (2) POS tags (5)
In Detecting Errors in Automatically-Parsed Dependency Relations
  1. Units of comparison To determine similarity, one can compare dependency relations, POS tags , or both.
    Page 3, “Ad hoc rule detection”
  2. Thus, we use the pairs of dependency relations and POS tags as the units of comparison.
    Page 3, “Ad hoc rule detection”
  3. One method which does not have this problem of overflagging uses a “lexicon” of POS tag pairs, examining relations between POS, irrespective of position.
    Page 5, “Additional information”
  4. We use the gold standard POS tags for all experiments.
    Page 6, “Evaluation”
  5. For example, the parsed rule TA —> IG:IG RO has a correct dependency relation (IG) between the POS tags IG and its head RO, yet is assigned a whole rule score of 2 and a bigram score of 20.
    Page 6, “Evaluation”
  6. This is likely due to the fact that Alpino has the smallest label set of any of the corpora, with only 24 dependency labels and 12 POS tags (cf.
    Page 7, “Evaluation”
  7. Likewise, with fewer possible POS tag pairs, Alpino has lower precision for the low-threshold POS scores than the other corpora.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2010 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

treebank

Appears in 7 sentences as: treebank (6) treebanks (1)
In Detecting Errors in Automatically-Parsed Dependency Relations
  1. Furthermore, parsing accuracy degrades unless sufficient amounts of labeled training data from the same domain are available (e.g., Gildea, 2001; Sekine, 1997), and thus we need larger and more varied annotated treebanks , covering a wide range of domains.
    Page 1, “Introduction and Motivation”
  2. However, there is a bottleneck in obtaining annotation, due to the need for manual intervention in annotating a treebank .
    Page 1, “Introduction and Motivation”
  3. Ad hoc rules are CFG productions extracted from a treebank which are “used for specific constructions and unlikely to be used again,” indicating annotation errors and rules for ungram-maticalities (see also Dickinson and Foster, 2009).
    Page 2, “Approach”
  4. Each method compares a given CFG rule to all the rules in a treebank grammar.
    Page 2, “Approach”
  5. This procedure is applicable whether the rules in question are from a new data set—as in this paper, where parses are compared to a training data grammar—or drawn from the treebank grammar itself (i.e., an internal consistency check).
    Page 2, “Approach”
  6. The methods work because we expect there to be regularities in valency structure in a treebank grammar; nonconformity to such regularities indicates a potential problem.
    Page 2, “Approach”
  7. We have proposed different methods for flagging the errors in automatically-parsed corpora, by treating the problem as one of looking for anomalous rules with respect to a treebank grammar.
    Page 8, “Summary and Outlook”

See all papers in Proc. ACL 2010 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

UAS

Appears in 3 sentences as: UAS (3)
In Detecting Errors in Automatically-Parsed Dependency Relations
  1. For development, we also report unlabeled attachement scores ( UAS ).
    Page 5, “Evaluation”
  2. In the rest of table 1, we report the best-performing results for each of the methods,5 providing the number of rules below and above a particular threshold, along with corresponding UAS and LAS values.
    Page 6, “Evaluation”
  3. The whole rule and bigram methods reveal greater precision in identifying problematic dependencies, isolating elements with lower UAS and LAS scores than with frequency, along with corresponding greater pre-
    Page 6, “Evaluation”

See all papers in Proc. ACL 2010 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.