Robust Conversion of CCG Derivations to Phrase Structure Trees
Kummerfeld, Jonathan K. and Klein, Dan and Curran, James R.

Article Structure

Abstract

We propose an improved, bottom-up method for converting CCG derivations into PTB-style phrase structure trees.

Introduction

Converting the Penn Treebank (PTB, Marcus et al., 1993) to other formalisms, such as HPSG (Miyao et al., 2004), LFG (Cahill et al., 2008), LTAG (Xia, 1999), and CCG (Hockenmaier, 2003), is a complex process that renders linguistic phenomena in formalism-specific ways.

Background

There has been extensive work on converting parser output for evaluation, e. g. Lin (1998) and Briscoe et al.

Our Approach

Our conversion assigns a set of instructions to each lexical category and defines generic operations for each combinator that combine instructions.

Evaluation

Using sections 00-21 of the treebanks, we handcrafted instructions for 527 lexical categories, a process that took under 100 hours, and includes all the categories used by the C&C parser.

Conclusion

By exploiting the generalised combinators of the CCG formalism, we have developed a new method of converting CCG derivations into PTB-style trees.

Topics

CCG

Appears in 30 sentences as: CCG (37)
In Robust Conversion of CCG Derivations to Phrase Structure Trees
  1. We propose an improved, bottom-up method for converting CCG derivations into PTB-style phrase structure trees.
    Page 1, “Abstract”
  2. Converting the Penn Treebank (PTB, Marcus et al., 1993) to other formalisms, such as HPSG (Miyao et al., 2004), LFG (Cahill et al., 2008), LTAG (Xia, 1999), and CCG (Hockenmaier, 2003), is a complex process that renders linguistic phenomena in formalism-specific ways.
    Page 1, “Introduction”
  3. Clark and Curran (2009) developed a CCG to PTB conversion that treats the CCG derivation as a phrase structure tree and applies handcrafted rules to every pair of categories that combine in the derivation.
    Page 1, “Introduction”
  4. Because their approach does not exploit the gener-alisations inherent in the CCG formalism, they must resort to ad-hoc rules over nonlocal features of the CCG constituents being combined (when a fixed pair of CCG categories correspond to multiple PTB structures).
    Page 1, “Introduction”
  5. Our conversion assigns a set of bracket instructions to each word based on its CCG category, then follows the CCG derivation, applying and combining instructions at each combinatory step to build a phrase structure tree.
    Page 1, “Introduction”
  6. These issues make evaluating parser output difficult, but our method does enable an improved comparison of CCG and PTB parsers.
    Page 1, “Introduction”
  7. Our focus is on CCG to PTB conversion (Clark and Curran, 2009).
    Page 1, “Background”
  8. 2.1 Combinatory Categorial Grammar ( CCG )
    Page 1, “Background”
  9. The lower half of Figure 1 shows a CCG derivation (Steedman, 2000) in which each word is assigned a category, and combinatory rules are applied to adjacent categories until only one remains.
    Page 1, “Background”
  10. We cannot form a PTB tree by simply relabeling the categories in a CCG derivation because the mapping to phrase labels is many-to-many, CCG derivations contain extra brackets due to binarisation, and there are cases where the constituents in the PTB tree and the CCG derivation cross (e. g. in Figure l).
    Page 2, “Background”
  11. Clark and Curran (2009), hereafter C&C-CONV, assign a schema to each leaf (lexical category) and rule (pair of combining categories) in the CCG derivation.
    Page 2, “Background”

See all papers in Proc. ACL 2012 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

subtrees

Appears in 3 sentences as: subtrees (3)
In Robust Conversion of CCG Derivations to Phrase Structure Trees
  1. The PTB tree is constructed from the CCG bottom-up, creating leaves with lexical schemas, then merg-ing/adding subtrees using rule schemas at each step.
    Page 2, “Background”
  2. (8* f {a}) or default to X f, Place subtrees (PP f0 (S f1” k a))
    Page 3, “Our Approach”
  3. The subscripts indicate which subtrees to place where.
    Page 3, “Our Approach”

See all papers in Proc. ACL 2012 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

Treebank

Appears in 3 sentences as: Treebank (1) treebank (1) treebanks (1)
In Robust Conversion of CCG Derivations to Phrase Structure Trees
  1. Converting the Penn Treebank (PTB, Marcus et al., 1993) to other formalisms, such as HPSG (Miyao et al., 2004), LFG (Cahill et al., 2008), LTAG (Xia, 1999), and CCG (Hockenmaier, 2003), is a complex process that renders linguistic phenomena in formalism-specific ways.
    Page 1, “Introduction”
  2. Using sections 00-21 of the treebanks , we handcrafted instructions for 527 lexical categories, a process that took under 100 hours, and includes all the categories used by the C&C parser.
    Page 3, “Evaluation”
  3. Figure 3: For each sentence in the treebank , we plot the converted parser output against gold conversion (left), and the original parser evaluation against gold conversion (right).
    Page 4, “Evaluation”

See all papers in Proc. ACL 2012 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.