Accurate Context-Free Parsing with Combinatory Categorial Grammar
Fowler, Timothy A. D. and Penn, Gerald

Article Structure

Abstract

The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author.

Introduction

Combinatory categorial grammar (CCG) is a variant of categorial grammar which has attracted interest for both theoretical and practical reasons.

The Language Classes of Combinatory Categorial Grammars

A categorial grammar is a grammatical system consisting of a finite set of words, a set of categories, a finite set of sentential categories, a finite lexicon mapping words to categories and a rule system dictating how the categories can be combined.

A Latent Variable CCG Parser

The context-freeness of a number of CCGs should not be considered evidence that there is no advantage to CCG as a grammar formalism.

Conclusion

We have provided a number of theoretical results proving that CCGbank contains no non-context-free structure and that the Clark and Curran parser is actually a context-free parser.

Topics

CCG

Appears in 46 sentences as: CCG (53)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. The definition of combinatory categorial grammar ( CCG ) in the literature varies quite a bit from author to author.
    Page 1, “Abstract”
  2. However, the differences between the definitions are important in terms of the language classes of each CCG .
    Page 1, “Abstract”
  3. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007).
    Page 1, “Abstract”
  4. Combinatory categorial grammar ( CCG ) is a variant of categorial grammar which has attracted interest for both theoretical and practical reasons.
    Page 1, “Introduction”
  5. On the practical side, we have corpora with CCG derivations for each sentence (Hockenmaier and Steedman, 2007), a wide-coverage parser trained on that corpus (Clark and Curran, 2007) and a system for converting CCG derivations into semantic representations (Bos et al., 2004).
    Page 1, “Introduction”
  6. However, despite being treated as a single unified grammar formalism, each of these authors use variations of CCG which differ primarily on which combinators are included in the grammar and the restrictions that are put on them.
    Page 1, “Introduction”
  7. We will provide a generalized framework for CCG within which the full
    Page 1, “Introduction”
  8. variation of CCG seen in the literature can be defined.
    Page 1, “Introduction”
  9. Due to this insight, we investigate the potential of using tools from the probabilistic CFG community to improve CCG parsing results.
    Page 1, “Introduction”
  10. Bos’s system for building semantic representations from CCG derivations is only possible due to the categorial nature of CCG .
    Page 1, “Introduction”
  11. Furthermore, the long distance dependencies involved in extraction and coordination phenomena have a more natural representation in CCG .
    Page 1, “Introduction”

See all papers in Proc. ACL 2010 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

Penn treebank

Appears in 11 sentences as: Penn Treebank (1) Penn treebank (10)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. The Petrov parser (Petrov and Klein, 2007) uses latent variables to refine the grammar extracted from a corpus to improve accuracy, originally used to improve parsing results on the Penn treebank (PTB).
    Page 1, “Introduction”
  2. These results should not be interpreted as proof that grammars extracted from the Penn treebank and from CCGbank are equivalent.
    Page 1, “Introduction”
  3. CCGbank (Hockenmaier and Steedman, 2007) is a corpus of CCG derivations that was semiautomatically converted from the Wall Street J our-nal section of the Penn treebank .
    Page 4, “The Language Classes of Combinatory Categorial Grammars”
  4. Unlike the context-free grammars extracted from the Penn treebank , these allow for the categorial semantics that accompanies any categorial parse and for a more elegant analysis of linguistic structures such as extraction and coordination.
    Page 5, “A Latent Variable CCG Parser”
  5. in Petrov’s experiments on the Penn treebank , the syntactic category NP was refined to the more fine-grained N P1 and N P2 roughly corresponding to N Ps in subject and object positions.
    Page 5, “A Latent Variable CCG Parser”
  6. In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories.
    Page 6, “A Latent Variable CCG Parser”
  7. However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags, we can only evaluate the accuracy of the supertags.
    Page 6, “A Latent Variable CCG Parser”
  8. Figure 9 gives a comparison between the Petrov parser trained on the Penn treebank and on CCGbank.
    Page 6, “A Latent Variable CCG Parser”
  9. These numbers should not be directly compared, but the similarity of the unlabeled measures indicates that the difference between the structure of the Penn treebank and CCGbank is not large.7
    Page 6, “A Latent Variable CCG Parser”
  10. 7Because punctuation in CCG can have grammatical function, we include it in our accuracy calculations resulting in lower scores for the Petrov parser trained on the Penn treebank than those reported in Petrov and Klein (2007).
    Page 6, “A Latent Variable CCG Parser”
  11. As a final evaluation, we compare the resources that are required to both train and parse with the Petrov parser on the Penn Treebank , the Petrov parser on the original version of CCGbank, the Petrov parser on CCGbank without features and the Clark and Curran parser using the two models.
    Page 9, “A Latent Variable CCG Parser”

See all papers in Proc. ACL 2010 that mention Penn treebank.

See all papers in Proc. ACL that mention Penn treebank.

Back to top.

treebank

Appears in 11 sentences as: Treebank (1) treebank (10)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. The Petrov parser (Petrov and Klein, 2007) uses latent variables to refine the grammar extracted from a corpus to improve accuracy, originally used to improve parsing results on the Penn treebank (PTB).
    Page 1, “Introduction”
  2. These results should not be interpreted as proof that grammars extracted from the Penn treebank and from CCGbank are equivalent.
    Page 1, “Introduction”
  3. CCGbank (Hockenmaier and Steedman, 2007) is a corpus of CCG derivations that was semiautomatically converted from the Wall Street J our-nal section of the Penn treebank .
    Page 4, “The Language Classes of Combinatory Categorial Grammars”
  4. Unlike the context-free grammars extracted from the Penn treebank , these allow for the categorial semantics that accompanies any categorial parse and for a more elegant analysis of linguistic structures such as extraction and coordination.
    Page 5, “A Latent Variable CCG Parser”
  5. in Petrov’s experiments on the Penn treebank , the syntactic category NP was refined to the more fine-grained N P1 and N P2 roughly corresponding to N Ps in subject and object positions.
    Page 5, “A Latent Variable CCG Parser”
  6. In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories.
    Page 6, “A Latent Variable CCG Parser”
  7. However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags, we can only evaluate the accuracy of the supertags.
    Page 6, “A Latent Variable CCG Parser”
  8. Figure 9 gives a comparison between the Petrov parser trained on the Penn treebank and on CCGbank.
    Page 6, “A Latent Variable CCG Parser”
  9. These numbers should not be directly compared, but the similarity of the unlabeled measures indicates that the difference between the structure of the Penn treebank and CCGbank is not large.7
    Page 6, “A Latent Variable CCG Parser”
  10. 7Because punctuation in CCG can have grammatical function, we include it in our accuracy calculations resulting in lower scores for the Petrov parser trained on the Penn treebank than those reported in Petrov and Klein (2007).
    Page 6, “A Latent Variable CCG Parser”
  11. As a final evaluation, we compare the resources that are required to both train and parse with the Petrov parser on the Penn Treebank , the Petrov parser on the original version of CCGbank, the Petrov parser on CCGbank without features and the Clark and Curran parser using the two models.
    Page 9, “A Latent Variable CCG Parser”

See all papers in Proc. ACL 2010 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

statistically significant

Appears in 4 sentences as: statistical significance (1) statistically significant (3)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. To determine statistical significance , we obtain p-values from Bikel’s randomized parsing evaluation comparator6, modified for use with tagging accuracy, F-score and dependency accuracy.
    Page 6, “A Latent Variable CCG Parser”
  2. The difference in accuracy is only statistically significant between Clark and Curran’s Normal Form model ignoring features and the Petrov parser trained on CCGbank without features (p-value = 0.013).
    Page 6, “A Latent Variable CCG Parser”
  3. These results show that the features in CCGbank actually inhibit accuracy (to a statistically significant degree in the case of unlabeled accuracy on section ()0) when used as training data for the Petrov parser.
    Page 6, “A Latent Variable CCG Parser”
  4. The Petrov parser has better results by a statistically significant margin for both labeled and unlabeled recall and unlabeled F-score.
    Page 9, “A Latent Variable CCG Parser”

See all papers in Proc. ACL 2010 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

F-score

Appears in 3 sentences as: F-score (3)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. To determine statistical significance, we obtain p-values from Bikel’s randomized parsing evaluation comparator6, modified for use with tagging accuracy, F-score and dependency accuracy.
    Page 6, “A Latent Variable CCG Parser”
  2. In this section we evaluate the parsers using the traditional PARSEVAL measures which measure recall, precision and F-score on constituents in
    Page 6, “A Latent Variable CCG Parser”
  3. The Petrov parser has better results by a statistically significant margin for both labeled and unlabeled recall and unlabeled F-score .
    Page 9, “A Latent Variable CCG Parser”

See all papers in Proc. ACL 2010 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

fine-grained

Appears in 3 sentences as: fine-grained (3)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. The Petrov parser uses latent variables to refine a coarse-grained grammar extracted from a training corpus to a grammar which makes much more fine-grained syntactic distinctions.
    Page 5, “A Latent Variable CCG Parser”
  2. in Petrov’s experiments on the Penn treebank, the syntactic category NP was refined to the more fine-grained N P1 and N P2 roughly corresponding to N Ps in subject and object positions.
    Page 5, “A Latent Variable CCG Parser”
  3. However, this fine-grained control is exactly what the Petrov parser does automatically.
    Page 5, “A Latent Variable CCG Parser”

See all papers in Proc. ACL 2010 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

POS tags

Appears in 3 sentences as: POS tagging (1) POS tags (3)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories.
    Page 6, “A Latent Variable CCG Parser”
  2. However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags , we can only evaluate the accuracy of the supertags.
    Page 6, “A Latent Variable CCG Parser”
  3. Despite the lack of POS tags in the Petrov parser, we can see that it performs slightly better than the Clark and Curran parser.
    Page 6, “A Latent Variable CCG Parser”

See all papers in Proc. ACL 2010 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

semantic representations

Appears in 3 sentences as: semantic representations (3)
In Accurate Context-Free Parsing with Combinatory Categorial Grammar
  1. On the practical side, we have corpora with CCG derivations for each sentence (Hockenmaier and Steedman, 2007), a wide-coverage parser trained on that corpus (Clark and Curran, 2007) and a system for converting CCG derivations into semantic representations (Bos et al., 2004).
    Page 1, “Introduction”
  2. Bos’s system for building semantic representations from CCG derivations is only possible due to the categorial nature of CCG.
    Page 1, “Introduction”
  3. First, the ability to extract semantic representations from CCG derivations is not dependent on the language class of a CCG.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention semantic representations.

See all papers in Proc. ACL that mention semantic representations.

Back to top.