Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
Li, Zhenghua and Liu, Ting and Che, Wanxiang

Article Structure

Abstract

We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing.

Introduction

The scale of available labeled data significantly affects the performance of statistical data—driven models.

Related Work

The present work is primarily inspired by Jiang et al.

Dependency Parsing

Given an input sentence x = wowl...wn and its POS tag sequence 1; = totl...tn, the goal of dependency parsing is to build a dependency tree as depicted in Figure l, denoted by d = {(h, m, l) : 0 g h 3 72,0 < m g n,l E L}, where (h,m, l) indicates an directed arc from the head word (also called father) w, to the modifier (also called child or dependent) wm with a dependency label l, and L is the label set.

Dependency Parsing with QG Features

Smith and Eisner (2006) propose the QG for machine translation (MT) problems, allowing greater syntactic divergences between the two languages.

Experiments and Analysis

We use the CDT as the source treebank (Liu et al., 2006).

Conclusions

The current paper proposes a simple and effective framework for exploiting multiple large—scale treebanks of different annotation styles.

Topics

treebank

Appears in 62 sentences as: Treebank (10) treebank (47) treebanking (1) Treebanks (1) treebanks (21)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing.
    Page 1, “Abstract”
  2. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks .
    Page 1, “Abstract”
  3. Our approach can significantly advance the state—of—the—art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5.1 and 6.0) using the Chinese Dependency Treebank as the source treebank .
    Page 1, “Abstract”
  4. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.
    Page 1, “Abstract”
  5. However, the heavy cost of treebanking typically limits one single treebank in both scale and genre.
    Page 1, “Introduction”
  6. At present, learning from one single treebank seems inadequate for further boosting parsing accuracy.1
    Page 1, “Introduction”
  7. Treebanks # of Words Grammar CTB5 0.51 million Phrase structure CTB6 0.78 million Phrase structure
    Page 1, “Introduction”
  8. Table 1: Several publicly available Chinese treebanks .
    Page 1, “Introduction”
  9. Therefore, studies have recently resorted to other resources for the enhancement of parsing models, such as large—scale unlabeled data (Koo et al., 2008; Chen et al., 2009; Bansal and Klein, 2011; Zhou et al., 2011), and bilingual texts or cross—lingual treebanks (Burkett and Klein, 2008; Huang et al., 2009; Burkett et al., 2010; Chen et al., 2010).
    Page 1, “Introduction”
  10. The existence of multiple monolingual treebanks opens another door for this issue.
    Page 1, “Introduction”
  11. For example, table 1 lists a few publicly available Chinese treebanks that are motivated by different linguistic theories or applications.
    Page 1, “Introduction”

See all papers in Proc. ACL 2012 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

POS tags

Appears in 24 sentences as: POS tag (2) POS tagger (4) POS tagging (4) POS tags (17)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. Given an input sentence x = wowl...wn and its POS tag sequence 1; = totl...tn, the goal of dependency parsing is to build a dependency tree as depicted in Figure l, denoted by d = {(h, m, l) : 0 g h 3 72,0 < m g n,l E L}, where (h,m, l) indicates an directed arc from the head word (also called father) w, to the modifier (also called child or dependent) wm with a dependency label l, and L is the label set.
    Page 3, “Dependency Parsing”
  2. The type of the TP is conjoined with the related words and POS tags , such that the QG—enhanced parsing models can make more elaborate decisions based on the context.
    Page 5, “Dependency Parsing with QG Features”
  3. CDT and CTB5/6 adopt different POS tag sets, and converting from one tag set to another is difficult (Niu et al., 2009).5 To overcome this problem, we use the People’s Daily corpus (PD),6 a large—scale corpus annotated with word segmentation and POS tags, to train a statistical POS tagger .
    Page 5, “Experiments and Analysis”
  4. The tagger produces a universal layer of POS tags for both the source and target treebanks.
    Page 5, “Experiments and Analysis”
  5. For all models used in current work ( POS tagging and parsing), we adopt averaged perceptron to train the feature weights (Collins, 2002).
    Page 6, “Experiments and Analysis”
  6. First, we train a statistical POS tagger on the training set of PD, which we name TaggerPD .8 The tagging accuracy on the test set of PD is 98.30%.
    Page 6, “Experiments and Analysis”
  7. We then use TaggerPD to produce POS tags for all the treebanks (CDT, CTB5, and CTB6).
    Page 6, “Experiments and Analysis”
  8. Based on the common POS tags , we train a second—order source parser (02) on CDT, denoted by ParserCDT.
    Page 6, “Experiments and Analysis”
  9. 8We adopt the Chinese-oriented POS tagging features proposed in Zhang and Clark (2008a).
    Page 6, “Experiments and Analysis”
  10. Table 4: Parsing accuracy (UAS) comparison on CTB5—test with gold—standard POS tags .
    Page 6, “Experiments and Analysis”
  11. Table 4 shows the results when the gold—standard POS tags of CTBS are adopted by the parsing models.
    Page 6, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

parsing models

Appears in 14 sentences as: parsing model (1) parsing models (13)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. Based on such TPs, we design quasi-synchronous grammar features to augment the baseline parsing models .
    Page 1, “Abstract”
  2. Therefore, studies have recently resorted to other resources for the enhancement of parsing models , such as large—scale unlabeled data (Koo et al., 2008; Chen et al., 2009; Bansal and Klein, 2011; Zhou et al., 2011), and bilingual texts or cross—lingual treebanks (Burkett and Klein, 2008; Huang et al., 2009; Burkett et al., 2010; Chen et al., 2010).
    Page 1, “Introduction”
  3. enhanced parsing models to softly learn the systematic inconsistencies based on QG features, making our approach simpler and more robust.
    Page 3, “Related Work”
  4. Our approach is also intuitively related to stacked learning (SL), a machine learning framework that has recently been applied to dependency parsing to integrate two mainstream parsing models , i.e., graph—based and transition—based models (Nivre and McDonald, 2008; Martins et al., 2008).
    Page 3, “Related Work”
  5. In the current research, we adopt the graph—based parsing models for their state—of—the—art performance in a variety of languages.3 Graph—based models view the problem as finding the highest scoring tree from a directed graph.
    Page 3, “Dependency Parsing”
  6. We implement three parsing models of varying strengths in capturing features to better understand the effect of the proposed QG features.
    Page 3, “Dependency Parsing”
  7. parsing models (Yamada and Matsumoto, 2003; Nivre, 2003) with minor modifications.
    Page 3, “Dependency Parsing”
  8. Figure 2: Scoring parts used in our graph—based parsing models .
    Page 3, “Dependency Parsing”
  9. Figure 4 presents the three kinds of TPs used in our model, which correspond to the three scoring parts of our parsing models .
    Page 5, “Dependency Parsing with QG Features”
  10. Based on these TPs, we propose the QG features for enhancing the baseline parsing models , which are shown in Table 2.
    Page 5, “Dependency Parsing with QG Features”
  11. The type of the TP is conjoined with the related words and POS tags, such that the QG—enhanced parsing models can make more elaborate decisions based on the context.
    Page 5, “Dependency Parsing with QG Features”

See all papers in Proc. ACL 2012 that mention parsing models.

See all papers in Proc. ACL that mention parsing models.

Back to top.

UAS

Appears in 8 sentences as: UAS (8)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. We adopt unlabeled attachment score ( UAS ) as the primary evaluation metric.
    Page 6, “Experiments and Analysis”
  2. The UAS on CDT—test is 84.45%.
    Page 6, “Experiments and Analysis”
  3. Table 4: Parsing accuracy ( UAS ) comparison on CTB5—test with gold—standard POS tags.
    Page 6, “Experiments and Analysis”
  4. Table 5: Parsing accuracy ( UAS ) comparison on CTB5—test with automatic POS tags.
    Page 7, “Experiments and Analysis”
  5. Setting UAS CM RA fbs 79.67 26.81 73.82 fqg 79.15 26.34 74.71
    Page 7, “Experiments and Analysis”
  6. Figure 5: Parsing accuracy ( UAS ) comparison on CTB5—test when the scale of CDT and CTB5 varies (thousands in sentence number).
    Page 8, “Experiments and Analysis”
  7. Table 8: Parsing accuracy ( UAS ) comparison on CTB6—test with automatic POS tags.
    Page 8, “Experiments and Analysis”
  8. Table 9: Parsing accuracy ( UAS ) comparison on the test set of CTB5X.
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

dependency parsing

Appears in 6 sentences as: dependency parsing (6)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. 2CTB5 is converted to dependency structures following the standard practice of dependency parsing (Zhang and Clark, 2008b).
    Page 2, “Introduction”
  2. These features are tailored to the dependency parsing problem.
    Page 2, “Related Work”
  3. Our approach is also intuitively related to stacked learning (SL), a machine learning framework that has recently been applied to dependency parsing to integrate two mainstream parsing models, i.e., graph—based and transition—based models (Nivre and McDonald, 2008; Martins et al., 2008).
    Page 3, “Related Work”
  4. Given an input sentence x = wowl...wn and its POS tag sequence 1; = totl...tn, the goal of dependency parsing is to build a dependency tree as depicted in Figure l, denoted by d = {(h, m, l) : 0 g h 3 72,0 < m g n,l E L}, where (h,m, l) indicates an directed arc from the head word (also called father) w, to the modifier (also called child or dependent) wm with a dependency label l, and L is the label set.
    Page 3, “Dependency Parsing”
  5. We omit the label l because we focus on unlabeled dependency parsing in the present paper.
    Page 3, “Dependency Parsing”
  6. (2011) show that a joint POS tagging and dependency parsing model can significantly improve parsing accuracy over a pipeline model.
    Page 7, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

word segmentation

Appears in 4 sentences as: word segmentation (4)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. (2009) improve the performance of word segmentation and part—of—speech (POS) tagging on CTBS using another large—scale corpus of different annotation standards (People’s Daily).
    Page 2, “Related Work”
  2. CDT and CTB5/6 adopt different POS tag sets, and converting from one tag set to another is difficult (Niu et al., 2009).5 To overcome this problem, we use the People’s Daily corpus (PD),6 a large—scale corpus annotated with word segmentation and POS tags, to train a statistical POS tagger.
    Page 5, “Experiments and Analysis”
  3. 5 The word segmentation standards of the two treebanks also slightly differs, which are not considered in this work.
    Page 5, “Experiments and Analysis”
  4. Moreover, inferior results may be gained due to the differences between CTB5 and PD in word segmentation standards and text sources.
    Page 7, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

CoNLL

Appears in 3 sentences as: CoNLL (3)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. CTB6 is used as the Chinese data set in the CoNLL 2009 shared task (Hajic et al., 2009).
    Page 5, “Experiments and Analysis”
  2. We list the top three systems of the CoNLL 2009 shared task in Table 8, showing that our approach also advances the state—of—the—art parsing accuracy on this data set.10
    Page 8, “Experiments and Analysis”
  3. The parsing accuracies of the top systems may be underestimated since the accuracy of the provided POS tags in CoNLL 2009 is only 92.38% on the test set, while the POS tagger used in our experiments reaches 94.08%.
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

constituency parser

Appears in 3 sentences as: constituency parser (1) constituency parsers (1) constituent parser (1)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. They automatically convert the dependency—structure CDT into the phrase—structure style of CTBS using a statistical constituency parser trained on CTBS.
    Page 2, “Related Work”
  2. Their experiments show that the combined treebank can significantly improve the performance of constituency parsers .
    Page 2, “Related Work”
  3. (2009) use the maximum entropy inspired generative parser (GP) of Charniak (2000) as their constituent parser .
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention constituency parser.

See all papers in Proc. ACL that mention constituency parser.

Back to top.

dependency tree

Appears in 3 sentences as: dependency tree (3)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. Given an input sentence x = wowl...wn and its POS tag sequence 1; = totl...tn, the goal of dependency parsing is to build a dependency tree as depicted in Figure l, denoted by d = {(h, m, l) : 0 g h 3 72,0 < m g n,l E L}, where (h,m, l) indicates an directed arc from the head word (also called father) w, to the modifier (also called child or dependent) wm with a dependency label l, and L is the label set.
    Page 3, “Dependency Parsing”
  2. To guarantee the efficiency of the decoding algorithms, the score of a dependency tree is factored into the scores of some small parts (subtrees).
    Page 3, “Dependency Parsing”
  3. During both the training and test phases, the target parser are inspired by the source annotations, and the score of a target dependency tree becomes
    Page 4, “Dependency Parsing with QG Features”

See all papers in Proc. ACL 2012 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

significantly improve

Appears in 3 sentences as: significantly improve (3)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. Their experiments show that the combined treebank can significantly improve the performance of constituency parsers.
    Page 2, “Related Work”
  2. (2011) show that a joint POS tagging and dependency parsing model can significantly improve parsing accuracy over a pipeline model.
    Page 7, “Experiments and Analysis”
  3. Extensive experiments show that our approach can effectively utilize the syntactic knowledge from another treebank and significantly improve the state—of—the—art parsing accuracy.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2012 that mention significantly improve.

See all papers in Proc. ACL that mention significantly improve.

Back to top.

statistically significant

Appears in 3 sentences as: statistical significance (1) statistically significant (2)
In Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
  1. The p—values in parentheses present the statistical significance of the improvements.
    Page 6, “Experiments and Analysis”
  2. The improvements shown in parentheses are all statistically significant (p < 10—5).
    Page 7, “Experiments and Analysis”
  3. The improvements shown in parentheses are all statistically significant (p < 10—5).
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2012 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.