Comparing the Accuracy of CCG and Penn Treebank Parsers
Clark, Stephen and Curran, James R.

Article Structure

Abstract

We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.

Introduction

There are a number of approaches emerging in statistical parsing.

The CCG to PTB Conversion

There has been much recent work in attempting to convert native parser output into alternative representations for evaluation purposes, e.g.

Evaluation

The Berkeley parser (Petrov and Klein, 2007) provides performance close to the state-of-the-art for the PTB parsing task, with reported F-scores of around 90%.

Conclusion

One question that is often asked of the CCG parsing work is “Why not convert back into the PTB representation and perform a Parseval evaluation?” By showing how difficult the conversion is, we believe that we have finally answered this question, as well as demonstrating comparable performance with the Berkeley parser.

Topics

CCG

Appears in 30 sentences as: CCG (35)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.
    Page 1, “Abstract”
  2. An accuracy comparison is performed by converting the CCG derivations into PTB trees.
    Page 1, “Abstract”
  3. We show that the conversion is extremely difficult to perform, but are able to fairly compare the parsers on a representative subset of the PTB test section, obtaining results for the CCG parser that are statistically no different to those for the Berkeley parser.
    Page 1, “Abstract”
  4. The second approach is to apply statistical methods to parsers based on linguistic formalisms, such as HPSG, LFG, TAG, and CCG , with the grammar being defined manually or extracted from a formalism-specific treebank.
    Page 1, “Introduction”
  5. The formalism-based parser we use is the CCG parser of Clark and Curran (2007), which is based on CCGbank (Hockenmaier and Steedman, 2007), a CCG version of the Penn Treebank.
    Page 1, “Introduction”
  6. The comparison focuses on accuracy and is performed by converting CCG derivations into PTB phrase-structure trees.
    Page 1, “Introduction”
  7. A second contribution is to provide the first accuracy comparison of the CCG parser with a PTB parser, obtaining competitive scores for the CCG parser on a representative subset of the PTB test sections.
    Page 1, “Introduction”
  8. It is important to note that the purpose of this evaluation is comparison with a PTB parser, rather than evaluation of the CCG parser per se.
    Page 1, “Introduction”
  9. The CCG parser has been extensively evaluated elsewhere (Clark and Curran, 2007), and arguably GRs or predicate-argument structures provide a more suitable test set for the CCG parser than PTB phrase-structure trees.
    Page 1, “Introduction”
  10. shows that converting gold-standard CCG derivations into the GRs in DepBank resulted in an F-score of only 85%; hence the upper bound on the performance of the CCG parser, using this evaluation scheme, was only 85%.
    Page 2, “The CCG to PTB Conversion”
  11. First, the corresponding derivations in the treebanks are not isomorphic: a CCG derivation is not simply a relabelling of the nodes in the PTB tree; there are many constructions, such as coordination and control structures, where the trees are a different shape, as well as having different labels.
    Page 2, “The CCG to PTB Conversion”

See all papers in Proc. ACL 2009 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

Berkeley parser

Appears in 9 sentences as: Berkeley parser (9) Berkeley parsers (1)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. We show that the conversion is extremely difficult to perform, but are able to fairly compare the parsers on a representative subset of the PTB test section, obtaining results for the CCG parser that are statistically no different to those for the Berkeley parser .
    Page 1, “Abstract”
  2. PTB parser we use for comparison is the publicly available Berkeley parser (Petrov and Klein, 2007).
    Page 1, “Introduction”
  3. The Berkeley parser (Petrov and Klein, 2007) provides performance close to the state-of-the-art for the PTB parsing task, with reported F-scores of around 90%.
    Page 3, “Evaluation”
  4. As can be seen from the scores, these sentences form a slightly easier subset than the full section ()0, but this is a subset which can be used for a fair comparison against the Berkeley parser , since the conversion process is not lossy for this subset.
    Page 3, “Evaluation”
  5. We compare the CCG parser to the Berkeley parser using the accurate mode of the Berkeley parser , together with the model supplied with the publicly available version.
    Page 3, “Evaluation”
  6. Table 3 gives the results for Section 23, comparing the CCG and Berkeley parsers .
    Page 3, “Evaluation”
  7. The purpose of this column is to obtain an approximation of the CCG parser score for a perfect conversion process.5 The results in bold are those which we consider to be a fair comparison against the Berkeley parser .
    Page 3, “Evaluation”
  8. ison is likely to be an easy subset consisting of shorter sentences, and so the most that can be said is that the CCG parser performs as well as the Berkeley parser on short sentences.
    Page 4, “Evaluation”
  9. One question that is often asked of the CCG parsing work is “Why not convert back into the PTB representation and perform a Parseval evaluation?” By showing how difficult the conversion is, we believe that we have finally answered this question, as well as demonstrating comparable performance with the Berkeley parser .
    Page 4, “Conclusion”

See all papers in Proc. ACL 2009 that mention Berkeley parser.

See all papers in Proc. ACL that mention Berkeley parser.

Back to top.

Treebank

Appears in 7 sentences as: Treebank (3) treebank (2) treebanks (2)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.
    Page 1, “Abstract”
  2. The first approach, which began in the mid-90$ and now has an extensive literature, is based on the Penn Treebank (PTB) parsing task: inferring skeletal phrase-structure trees for unseen sentences of the W8], and evaluating accuracy according to the Parseval metrics.
    Page 1, “Introduction”
  3. The second approach is to apply statistical methods to parsers based on linguistic formalisms, such as HPSG, LFG, TAG, and CCG, with the grammar being defined manually or extracted from a formalism-specific treebank .
    Page 1, “Introduction”
  4. Evaluation is typically performed by comparing against predicate-argument structures extracted from the treebank , or against a test set of manually annotated grammatical relations (GRs).
    Page 1, “Introduction”
  5. The formalism-based parser we use is the CCG parser of Clark and Curran (2007), which is based on CCGbank (Hockenmaier and Steedman, 2007), a CCG version of the Penn Treebank .
    Page 1, “Introduction”
  6. However, there are a number of differences between the two treebanks which make the conversion back far from trivial.
    Page 2, “The CCG to PTB Conversion”
  7. First, the corresponding derivations in the treebanks are not isomorphic: a CCG derivation is not simply a relabelling of the nodes in the PTB tree; there are many constructions, such as coordination and control structures, where the trees are a different shape, as well as having different labels.
    Page 2, “The CCG to PTB Conversion”

See all papers in Proc. ACL 2009 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

gold-standard

Appears in 4 sentences as: gold-standard (4)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. shows that converting gold-standard CCG derivations into the GRs in DepBank resulted in an F-score of only 85%; hence the upper bound on the performance of the CCG parser, using this evaluation scheme, was only 85%.
    Page 2, “The CCG to PTB Conversion”
  2. The schemas were developed by manual inspection using section ()0 of CCGbank and the PTB as a development set, following the oracle methodology of Clark and Curran (2007), in which gold-standard derivations from CCGbank are converted to the new representation and compared with the gold standard for that representation.
    Page 2, “The CCG to PTB Conversion”
  3. The first row shows the results on only those sentences which the conversion process can convert sucessfully (as measured by converting gold-standard CCGbank derivations and comparing with PTB trees; although, to be clear, the scores are for the CCG parser on those sentences).
    Page 3, “Evaluation”
  4. The second row shows the scores on those sentences for which the conversion process was somewhat lossy, but when the gold-standard CCGbank derivations are converted, the oracle F-measure is greater than 95%.
    Page 3, “Evaluation”

See all papers in Proc. ACL 2009 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

F-score

Appears in 3 sentences as: F-score (3)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. shows that converting gold-standard CCG derivations into the GRs in DepBank resulted in an F-score of only 85%; hence the upper bound on the performance of the CCG parser, using this evaluation scheme, was only 85%.
    Page 2, “The CCG to PTB Conversion”
  2. The numbers are bracketing precision, recall, F-score and complete sentence matches, using the EVALB evaluation script.
    Page 3, “The CCG to PTB Conversion”
  3. The third row is similar, but for sentences for which the oracle F-score is geater than 92%.
    Page 3, “Evaluation”

See all papers in Proc. ACL 2009 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

gold standard

Appears in 3 sentences as: gold standard (3)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. The schemas were developed by manual inspection using section ()0 of CCGbank and the PTB as a development set, following the oracle methodology of Clark and Curran (2007), in which gold-standard derivations from CCGbank are converted to the new representation and compared with the gold standard for that representation.
    Page 2, “The CCG to PTB Conversion”
  2. The oracle conversion results from the gold standard CCGbank to the PTB for section 00 and 23 are shown in Table 2.
    Page 3, “The CCG to PTB Conversion”
  3. Results are calculated using both gold standard and automatically assigned POS tags; # is the number of sentences in the sample, and the % column gives the sample size as a percentage of the whole section.
    Page 3, “Evaluation”

See all papers in Proc. ACL 2009 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

Penn Treebank

Appears in 3 sentences as: Penn Treebank (3)
In Comparing the Accuracy of CCG and Penn Treebank Parsers
  1. We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.
    Page 1, “Abstract”
  2. The first approach, which began in the mid-90$ and now has an extensive literature, is based on the Penn Treebank (PTB) parsing task: inferring skeletal phrase-structure trees for unseen sentences of the W8], and evaluating accuracy according to the Parseval metrics.
    Page 1, “Introduction”
  3. The formalism-based parser we use is the CCG parser of Clark and Curran (2007), which is based on CCGbank (Hockenmaier and Steedman, 2007), a CCG version of the Penn Treebank .
    Page 1, “Introduction”

See all papers in Proc. ACL 2009 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.