Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
Uematsu, Sumire and Matsuzaki, Takuya and Hanaoka, Hiroki and Miyao, Yusuke and Mima, Hideki

Article Structure

Abstract

This paper describes a method of inducing wide-coverage CCG resources for Japanese.

Introduction

Syntactic parsing for Japanese has been dominated by a dependency-based pipeline in which chunk-based dependency parsing is applied and then semantic role labeling is performed on the dependencies (Sasano and Kurohashi, 2011; Kawahara and Kurohashi, 2011; Kudo and Matsumoto, 2002; Iida and Poesio, 2011; Hayashibe et al., 2011).

Background

2.1 Combinatory Categorial Grammar

Corpus integration and conversion

For wide-coverage CCG parsing, we need a) a wide-coverage CCG lexicon, b) combinatory rules, c) training data for parse disambiguation, and d) a parser (e.g., a CKY parser).

Evaluation

We used the following for the implementation of our resources: Kyoto corpus ver.

Conclusion

In this paper, we proposed a method to induce wide-coverage Japanese resources based on CCG that will lead to deeper syntactic analysis for Japanese and presented empirical evaluation in terms of the quality of the obtained lexicon and the parsing accuracy.

Topics

CCG

Appears in 40 sentences as: CCG (44)
In Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
  1. This paper describes a method of inducing wide-coverage CCG resources for Japanese.
    Page 1, “Abstract”
  2. Our method first integrates multiple dependency-based corpora into phrase structure trees and then converts the trees into CCG derivations.
    Page 1, “Abstract”
  3. combinatory categorial grammar ( CCG ) (Steedman, 2001).
    Page 1, “Introduction”
  4. Our work is basically an extension of a seminal work on CCGbank (Hockenmaier and Steedman, 2007), in which the phrase structure trees of the Penn Treebank (PTB) (Marcus et al., 1993) are converted into CCG derivations and a wide-coverage CCG lexicon is then extracted from these derivations.
    Page 1, “Introduction”
  5. Moreover, the relation between chunk-based dependency structures and CCG derivations is not obvious.
    Page 1, “Introduction”
  6. We can then convert the phrase structure trees into CCG derivations.
    Page 1, “Introduction”
  7. In the following, we describe the details of the integration method as well as J apanese-specific issues in the conversion into CCG derivations.
    Page 1, “Introduction”
  8. Additionally, we discuss problems that remain in Japanese resources from the viewpoint of developing CCG derivations.
    Page 1, “Introduction”
  9. There are three primary contributions of this paper: 1) we show the first comprehensive results for Japanese CCG parsing, 2) we present a methodology for integrating multiple dependency-based re-
    Page 1, “Introduction”
  10. S :give’ money’ them’ I ’ Figure l: A CCG derivation.
    Page 2, “Introduction”
  11. sources to induce CCG derivations, and 3) we investigate the possibility of further improving CCG analysis by additional resources.
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

treebank

Appears in 10 sentences as: Treebank (1) treebank (8) treebanks (1)
In Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
  1. Our work is basically an extension of a seminal work on CCGbank (Hockenmaier and Steedman, 2007), in which the phrase structure trees of the Penn Treebank (PTB) (Marcus et al., 1993) are converted into CCG derivations and a wide-coverage CCG lexicon is then extracted from these derivations.
    Page 1, “Introduction”
  2. Yoshida (2005) proposed methods for extracting a wide-coverage lexicon based on HPSG from a phrase structure treebank of Japanese.
    Page 4, “Background”
  3. Their treebanks are annotated with dependencies of words, the conversion of which into phrase structures is not a big concern.
    Page 4, “Background”
  4. As we have adopted the method of CCGbank, which relies on a source treebank to be converted into CCG derivations, a critical issue to address is the absence of a Japanese counterpart to PTB.
    Page 4, “Corpus integration and conversion”
  5. Our solution is to first integrate multiple dependency-based resources and convert them into a phrase structure treebank that is independent
    Page 4, “Corpus integration and conversion”
  6. Next, we translate the treebank into CCG derivations (Step 2).
    Page 4, “Corpus integration and conversion”
  7. We first integrate and convert available Japanese corpora—namely, the Kyoto corpus, the NAIST text corpus, and the JP corpus —into a phrase structure treebank , which is similar in spirit to PTB.
    Page 4, “Corpus integration and conversion”
  8. (2009) that converts an Italian dependency treebank into constituency trees since their dependency trees are annotated down to the level of each word.
    Page 4, “Corpus integration and conversion”
  9. Our method integrates multiple dependency-based resources to convert them into an integrated phrase structure treebank .
    Page 9, “Conclusion”
  10. The obtained treebank is then transformed into CCG derivations.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

dependency parsing

Appears in 6 sentences as: dependency parsers (1) dependency parsing (5)
In Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
  1. Syntactic parsing for Japanese has been dominated by a dependency-based pipeline in which chunk-based dependency parsing is applied and then semantic role labeling is performed on the dependencies (Sasano and Kurohashi, 2011; Kawahara and Kurohashi, 2011; Kudo and Matsumoto, 2002; Iida and Poesio, 2011; Hayashibe et al., 2011).
    Page 1, “Introduction”
  2. The integrated corpus is divided into training, development, and final test sets following the standard data split in previous works on Japanese dependency parsing (Kudo and Matsumoto, 2002).
    Page 8, “Evaluation”
  3. Following conventions in research on Japanese dependency parsing , gold morphological analysis results were input to a parser.
    Page 9, “Evaluation”
  4. Comparing the parser’s performance with previous works on Japanese dependency parsing is difficult as our figures are not directly comparable to theirs.
    Page 9, “Evaluation”
  5. Thus, our parser is expected to be capable of real-world Japanese text analysis as well as dependency parsers .
    Page 9, “Evaluation”
  6. A comparison of the parsing accuracy with previous works on Japanese dependency parsing and English CCG parsing indicates that our parser can analyze real-world Japanese texts fairly well and that there is room for improvement in disambiguation models.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

semantic representation

Appears in 6 sentences as: semantic representation (4) semantic representations (3)
In Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
  1. 1, each category is associated with a lambda term of semantic representations , and each combinatory rule is associated with rules for semantic composition.
    Page 2, “Background”
  2. Since these rules are universal, we can obtain different semantic representations by switching the semantic representations of lexical categories.
    Page 2, “Background”
  3. coordination and semantic representation in particular.
    Page 3, “Background”
  4. For semantic representation , we define predicate argument structures (PASs) rather than the theory’s formal representation based on dynamic logic.
    Page 3, “Background”
  5. Sophisticating our semantic representation is left for future work.
    Page 3, “Background”
  6. 12) must be used to construct the semantic representation , namely the PAS.
    Page 7, “Corpus integration and conversion”

See all papers in Proc. ACL 2013 that mention semantic representation.

See all papers in Proc. ACL that mention semantic representation.

Back to top.

gold standard

Appears in 3 sentences as: gold standard (3)
In Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
  1. First, we obtained CCG derivations for evaluation sets by applying our conversion method and then used these derivations as gold standard .
    Page 9, “Evaluation”
  2. Lexical coverage indicates the number of words to which the grammar assigns a gold standard category.
    Page 9, “Evaluation”
  3. Sentential coverage indicates the number of sentences in which all words are assigned gold standard categories 5.
    Page 9, “Evaluation”

See all papers in Proc. ACL 2013 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.