Task-oriented Evaluation of Syntactic Parsers and Their Representations
Miyao, Yusuke and Saetre, Rune and Sagae, Kenji and Matsuzaki, Takuya and Tsujii, Jun'ichi

Article Structure

Abstract

This paper presents a comparative evaluation of several state-of—the-art English parsers based on different frameworks.

Introduction

Parsing technologies have improved considerably in the past few years, and high-performance syntactic parsers are no longer limited to PCFG—based frameworks (Charniak, 2000; Klein and Manning, 2003; Charniak and Johnson, 2005 ; Petrov and Klein, 2007), but also include dependency parsers (McDonald and Pereira, 2006; Nivre and Nilsson, 2005; Sagae and Tsujii, 2007) and deep parsers (Kaplan et al., 2004; Clark and Curran, 2004; Miyao and Tsujii, 2008).

Syntactic Parsers and Their Representations

This paper focuses on eight representative parsers that are classified into three parsing frameworks: dependency parsing, phrase structure parsing, and deep parsing.

Evaluation Methodology

In our approach to parser evaluation, we measure the accuracy of a PPI extraction system, in which

Experiments

4.1 Experiment settings

Related Work

Though the evaluation of syntactic parsers has been a major concern in the parsing community, and a couple of works have recently presented the comparison of parsers based on different frameworks, their methods were based on the comparison of the parsing accuracy in terms of a certain intermediate parse representation (Ringger et al., 2004; Kaplan et al., 2004; Briscoe and Carroll, 2006; Clark and Curran, 2007; Miyao et al., 2007; Clegg and Shepherd, 2007; Pyysalo et al., 2007b; Pyysalo et al., 2007a; Sagae et al., 2008).

Conclusion and Future Work

We have presented our attempts to evaluate syntactic parsers and their representations that are based on different frameworks; dependency parsing, phrase structure parsing, or deep parsing.

Topics

dependency parsing

Appears in 23 sentences as: dependency parser (2) dependency parsers (8) Dependency parsing (1) dependency parsing (12)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. We evaluate eight parsers (based on dependency parsing , phrase structure parsing, or deep parsing) using five different parse representations.
    Page 1, “Abstract”
  2. Parsing technologies have improved considerably in the past few years, and high-performance syntactic parsers are no longer limited to PCFG—based frameworks (Charniak, 2000; Klein and Manning, 2003; Charniak and Johnson, 2005 ; Petrov and Klein, 2007), but also include dependency parsers (McDonald and Pereira, 2006; Nivre and Nilsson, 2005; Sagae and Tsujii, 2007) and deep parsers (Kaplan et al., 2004; Clark and Curran, 2004; Miyao and Tsujii, 2008).
    Page 1, “Introduction”
  3. In this paper, we present a comparative evaluation of syntactic parsers and their output representations based on different frameworks: dependency parsing , phrase structure parsing, and deep parsing.
    Page 1, “Introduction”
  4. This paper focuses on eight representative parsers that are classified into three parsing frameworks: dependency parsing , phrase structure parsing, and deep parsing.
    Page 2, “Syntactic Parsers and Their Representations”
  5. 2.1 Dependency parsing
    Page 2, “Syntactic Parsers and Their Representations”
  6. Because the shared tasks of CoNLL-2006 and CoNLL-2007 focused on data-driven dependency parsing , it has recently been extensively studied in parsing research.
    Page 2, “Syntactic Parsers and Their Representations”
  7. The aim of dependency parsing is to compute a tree structure of a sentence where nodes are words, and edges represent the relations among words.
    Page 2, “Syntactic Parsers and Their Representations”
  8. Figure 1 shows a dependency tree for the sentence “IL-8 recognizes and activates CXCRl.” An advantage of dependency parsing is that dependency trees are a reasonable approximation of the semantics of sentences, and are readily usable in NLP applications.
    Page 2, “Syntactic Parsers and Their Representations”
  9. Furthermore, the efficiency of popular approaches to dependency parsing compare favorable with those of phrase structure parsing or deep parsing.
    Page 2, “Syntactic Parsers and Their Representations”
  10. While a number of approaches have been proposed for dependency parsing , this paper focuses on two typical methods.
    Page 2, “Syntactic Parsers and Their Representations”
  11. MST McDonald and Pereira (2006)’s dependency parser,1 based on the Eisner algorithm for projective dependency parsing (Eisner, 1996) with the second-order factorization.
    Page 2, “Syntactic Parsers and Their Representations”

See all papers in Proc. ACL 2008 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

Treebank

Appears in 11 sentences as: Treebank (8) treebank (3) treebanks (1)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. This assumes the existence of a gold-standard test corpus, such as the Penn Treebank (Marcus et al., 1994).
    Page 1, “Introduction”
  2. Most state-of-the-art parsers for English were trained with the Wall Street Journal (WSJ) portion of the Penn Treebank , and high accuracy has been reported for WSJ text; however, these parsers rely on lexical information to attain high accuracy, and it has been criticized that these parsers may overfit to WSJ text (Gildea, 2001;
    Page 1, “Introduction”
  3. When training data in the target domain is available, as is the case with the GENIA Treebank (Kim et al., 2003) for biomedical papers, a parser can be retrained to adapt to the target domain, and larger accuracy improvements are expected, if the training method is sufficiently general.
    Page 2, “Introduction”
  4. In general, our evaluation methodology can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be retrained with GENIA, thus allowing for us to investigate the effect of domain adaptation.
    Page 2, “Syntactic Parsers and Their Representations”
  5. Owing largely to the Penn Treebank , the mainstream of data-driven parsing research has been dedicated to the phrase structure parsing.
    Page 2, “Syntactic Parsers and Their Representations”
  6. ENJU The HPSG parser that consists of an HPSG grammar extracted from the Penn Treebank, and a maximum entropy model trained with an HPSG treebank derived from the Penn Treebank.7
    Page 3, “Syntactic Parsers and Their Representations”
  7. It should be noted, however, that this conversion cannot work perfectly with automatic parsing, because the conversion program relies on function tags and empty categories of the original Penn Treebank .
    Page 4, “Evaluation Methodology”
  8. Next, we run parsers retrained with GENIA11 (8127 sentences), which is a Penn Treebank-style treebank of biomedical paper abstracts.
    Page 5, “Evaluation Methodology”
  9. 10Some of the parser packages include parsing models trained with extended data, but we used the models trained with WSJ section 2-21 of the Penn Treebank .
    Page 5, “Evaluation Methodology”
  10. with a Penn Treebank-style treebank , we use those programs as-is.
    Page 5, “Evaluation Methodology”
  11. Although we restricted ourselves to parsers trainable with Penn Treebank-style treebanks , our methodology can be applied to any English parsers.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

CoNLL

Appears in 9 sentences as: CONLL (1) CoNLL (10)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. The concept is therefore similar to CoNLL dependencies, though PAS expresses deeper relations, and may include reentrant structures.
    Page 3, “Syntactic Parsers and Their Representations”
  2. CoNLL The dependency tree format used in the 2006 and 2007 CoNLL shared tasks on dependency parsing.
    Page 4, “Evaluation Methodology”
  3. KSDEP 1% CONLL RERANK NO—RERANK BERKELEY STANFORD ENJU ENJU—GENIA
    Page 4, “Evaluation Methodology”
  4. Although the concept looks similar to CoNLL , this representa-
    Page 4, “Evaluation Methodology”
  5. Although only CoNLL is available for dependency parsers, we can create four representations for the phrase structure parsers, and five for the deep parsers.
    Page 5, “Evaluation Methodology”
  6. CoNLL PTB HD SD PAS
    Page 6, “Experiments”
  7. Dependency-based representations are competitive, while CoNLL seems superior to HD and SD in spite of the imperfect conversion from PTB to CoNLL .
    Page 7, “Experiments”
  8. This might be a reason for the high performances of the dependency parsers that directly compute CoNLL dependencies.
    Page 7, “Experiments”
  9. The results for ENJU-CoNLL and ENJU-PAS show that PAS contributes to a larger accuracy improvement, although this does not necessarily mean the superiority of PAS, because two imperfect conversions, i.e., PAS-to-PTB and PTB-to-CoNLL, are applied for creating CoNLL .
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

domain adaptation

Appears in 8 sentences as: domain adaptation (9)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. In general, our evaluation methodology can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be retrained with GENIA, thus allowing for us to investigate the effect of domain adaptation .
    Page 2, “Syntactic Parsers and Their Representations”
  2. We also measure the accuracy improvements obtained by parser retraining with GENIA, to examine the domain portability, and to evaluate the effectiveness of domain adaptation .
    Page 3, “Evaluation Methodology”
  3. Accuracy improvements in this setting indicate the possibility of domain adaptation , and the portability of the training methods of the parsers.
    Page 5, “Evaluation Methodology”
  4. When the parsers are retrained with GENIA (Table 2), the accuracy increases significantly, demonstrating that the WSJ-trained parsers are not sufficiently domain-independent, and that domain adaptation is effective.
    Page 6, “Experiments”
  5. It is an important observation that the improvements by domain adaptation are larger than the differences among the parsers in the previous experiment.
    Page 6, “Experiments”
  6. A large improvement from ENJU to ENJU-GENIA shows the effectiveness of the specifically designed domain adaptation method, suggesting that the other parsers might also benefit from more sophisticated approaches for domain adaptation .
    Page 6, “Experiments”
  7. We also found that accuracy improvements vary when parsers are retrained with domain-specific data, indicating the importance of domain adaptation and the differences in the portability of parser training methods.
    Page 8, “Conclusion and Future Work”
  8. 2006), the C&C parser (Clark and Curran, 2004), the XLE parser (Kaplan et al., 2004), MINIPAR (Lin, 1998), and Link Parser (Sleator and Temperley, 1993; Pyysalo et al., 2006), but the domain adaptation of these parsers is not straightforward.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

syntactic parsers

Appears in 8 sentences as: syntactic parsers (6) syntactic parsing (2)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. Parsing technologies have improved considerably in the past few years, and high-performance syntactic parsers are no longer limited to PCFG—based frameworks (Charniak, 2000; Klein and Manning, 2003; Charniak and Johnson, 2005 ; Petrov and Klein, 2007), but also include dependency parsers (McDonald and Pereira, 2006; Nivre and Nilsson, 2005; Sagae and Tsujii, 2007) and deep parsers (Kaplan et al., 2004; Clark and Curran, 2004; Miyao and Tsujii, 2008).
    Page 1, “Introduction”
  2. However, efforts to perform extensive comparisons of syntactic parsers based on different frameworks have been limited.
    Page 1, “Introduction”
  3. In this paper, we present a comparative evaluation of syntactic parsers and their output representations based on different frameworks: dependency parsing, phrase structure parsing, and deep parsing.
    Page 1, “Introduction”
  4. PPI identification is a reasonable task for parser evaluation, because it is a typical information extraction (IE) application, and because recent studies have shown the effectiveness of syntactic parsing in this task.
    Page 1, “Introduction”
  5. Since our evaluation method is applicable to any parser output, and is grounded in a real application, it allows for a fair comparison of syntactic parsers based on different frameworks.
    Page 1, “Introduction”
  6. (2006) do not rely on syntactic parsing , While the former applied SVMs with kernels on surface strings and the latter is similar to our baseline method.
    Page 7, “Experiments”
  7. Though the evaluation of syntactic parsers has been a major concern in the parsing community, and a couple of works have recently presented the comparison of parsers based on different frameworks, their methods were based on the comparison of the parsing accuracy in terms of a certain intermediate parse representation (Ringger et al., 2004; Kaplan et al., 2004; Briscoe and Carroll, 2006; Clark and Curran, 2007; Miyao et al., 2007; Clegg and Shepherd, 2007; Pyysalo et al., 2007b; Pyysalo et al., 2007a; Sagae et al., 2008).
    Page 8, “Related Work”
  8. We have presented our attempts to evaluate syntactic parsers and their representations that are based on different frameworks; dependency parsing, phrase structure parsing, or deep parsing.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention syntactic parsers.

See all papers in Proc. ACL that mention syntactic parsers.

Back to top.

Penn Treebank

Appears in 7 sentences as: Penn Treebank (7)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. This assumes the existence of a gold-standard test corpus, such as the Penn Treebank (Marcus et al., 1994).
    Page 1, “Introduction”
  2. Most state-of-the-art parsers for English were trained with the Wall Street Journal (WSJ) portion of the Penn Treebank , and high accuracy has been reported for WSJ text; however, these parsers rely on lexical information to attain high accuracy, and it has been criticized that these parsers may overfit to WSJ text (Gildea, 2001;
    Page 1, “Introduction”
  3. In general, our evaluation methodology can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be retrained with GENIA, thus allowing for us to investigate the effect of domain adaptation.
    Page 2, “Syntactic Parsers and Their Representations”
  4. Owing largely to the Penn Treebank , the mainstream of data-driven parsing research has been dedicated to the phrase structure parsing.
    Page 2, “Syntactic Parsers and Their Representations”
  5. ENJU The HPSG parser that consists of an HPSG grammar extracted from the Penn Treebank , and a maximum entropy model trained with an HPSG treebank derived from the Penn Treebank.7
    Page 3, “Syntactic Parsers and Their Representations”
  6. It should be noted, however, that this conversion cannot work perfectly with automatic parsing, because the conversion program relies on function tags and empty categories of the original Penn Treebank .
    Page 4, “Evaluation Methodology”
  7. 10Some of the parser packages include parsing models trained with extended data, but we used the models trained with WSJ section 2-21 of the Penn Treebank .
    Page 5, “Evaluation Methodology”

See all papers in Proc. ACL 2008 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.

RERANK

Appears in 7 sentences as: RERANK (6) reranker (2)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. RERANK Charniak and Johnson (2005)’s rerank-ing parser.
    Page 2, “Syntactic Parsers and Their Representations”
  2. The reranker of this parser receives n-best4 parse results from NO-RERANK, and selects the most likely result by using a maximum entropy model with manually engineered features.
    Page 2, “Syntactic Parsers and Their Representations”
  3. KSDEP 1%CONLL RERANK NO—RERANK BERKELEY STANFORD ENJU ENJU—GENIA
    Page 4, “Evaluation Methodology”
  4. For the other parsers, we input the concatenation of W8] and GENIA for the retraining, while the reranker of RERANK was not retrained due to its cost.
    Page 5, “Evaluation Methodology”
  5. Since the parsers other than NO-RERANK and RERANK require an external POS tagger, a WSJ-trained POS tagger is used with WSJ-trained parsers, and geniatagger (Tsuruoka et al., 2005) is used with GENIA-retrained parsers.
    Page 5, “Evaluation Methodology”
  6. Among these parsers, RERANK performed slightly better than the other parsers, although the difference in the f-score is small, while it requires much higher parsing cost.
    Page 6, “Experiments”
  7. retraining yielded only slight improvements for RERANK , BERKELEY, and STANFORD, while larger improvements were observed for MST, KSDEP, NO-RERANK, and ENJU.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention RERANK.

See all papers in Proc. ACL that mention RERANK.

Back to top.

dependency tree

Appears in 6 sentences as: dependency tree (5) Dependency trees (1) dependency trees (1)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. Figure 1 shows a dependency tree for the sentence “IL-8 recognizes and activates CXCRl.” An advantage of dependency parsing is that dependency trees are a reasonable approximation of the semantics of sentences, and are readily usable in NLP applications.
    Page 2, “Syntactic Parsers and Their Representations”
  2. Figure l: CoNLL-X dependency tree
    Page 2, “Syntactic Parsers and Their Representations”
  3. For the protein pair IL-8 and CXCR1 in Figure 4, a dependency parser outputs a dependency tree shown in Figure 1.
    Page 3, “Evaluation Methodology”
  4. From this dependency tree , we can extract a dependency path shown in Figure 5, which appears to be a strong clue in knowing that these proteins are mentioned as interacting.
    Page 3, “Evaluation Methodology”
  5. CoNLL The dependency tree format used in the 2006 and 2007 CoNLL shared tasks on dependency parsing.
    Page 4, “Evaluation Methodology”
  6. HD Dependency trees of syntactic heads (Figure 8).
    Page 4, “Evaluation Methodology”

See all papers in Proc. ACL 2008 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

dependency path

Appears in 5 sentences as: Dependency path (1) dependency path (3) dependency paths (1)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. Figure 5: Dependency path
    Page 3, “Evaluation Methodology”
  2. From this dependency tree, we can extract a dependency path shown in Figure 5, which appears to be a strong clue in knowing that these proteins are mentioned as interacting.
    Page 3, “Evaluation Methodology”
  3. Figure 6: Tree representation of a dependency path
    Page 4, “Evaluation Methodology”
  4. For dependency-based parse representations, a dependency path is encoded as a flat tree as depicted in Figure 6 (prefix “r” denotes reverse relations).
    Page 4, “Evaluation Methodology”
  5. Because a tree kernel measures the similarity of trees by counting common subtrees, it is expected that the system finds effective subsequences of dependency paths .
    Page 4, “Evaluation Methodology”

See all papers in Proc. ACL 2008 that mention dependency path.

See all papers in Proc. ACL that mention dependency path.

Back to top.

machine learning

Appears in 4 sentences as: machine learning (4)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. Our approach to parser evaluation is to measure accuracy improvement in the task of identifying protein-protein interaction (PPI) information in biomedical papers, by incorporating the output of different parsers as statistical features in a machine learning classifier (Yakushiji et al., 2005; Katrenko and Adriaans, 2006; Erkan et al., 2007; Seetre et al., 2007).
    Page 1, “Introduction”
  2. the parser output is embedded as statistical features of a machine learning classifier.
    Page 3, “Evaluation Methodology”
  3. Recent studies on PPI extraction demonstrated that dependency relations between target proteins are effective features for machine learning classifiers (Katrenko and Adriaans, 2006; Erkan et al., 2007; Seetre et al., 2007).
    Page 3, “Evaluation Methodology”
  4. The basic idea is to measure the accuracy improvements of the PPI extraction task by incorporating the parser output as statistical features of a machine learning classifier.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

cross validation

Appears in 3 sentences as: cross validation (3)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. The accuracy is measured by abstract-wise 10-fold cross validation and the one-answer-per-occurrence criterion (Giuliano et al., 2006).
    Page 5, “Experiments”
  2. Table 3 shows the time for parsing the entire AImed corpus, and Table 4 shows the time required for 10-fold cross validation with GENIA-retrained parsers.
    Page 5, “Experiments”
  3. Since we did not run experiments on protein-pair—wise cross validation , our system cannot be compared directly to the results reported by Erkan et al.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.

extraction system

Appears in 3 sentences as: extraction system (2) extraction systems (1)
In Task-oriented Evaluation of Syntactic Parsers and Their Representations
  1. Our approach is to measure the impact of each parser when it is used as a component of an information extraction system that performs protein—protein interaction (PPI) identification in biomedical papers.
    Page 1, “Abstract”
  2. In our approach to parser evaluation, we measure the accuracy of a PPI extraction system , in which
    Page 3, “Evaluation Methodology”
  3. In the following experiments, we used AImed (Bunescu and Mooney, 2004), which is a popular corpus for the evaluation of PPI extraction systems .
    Page 5, “Experiments”

See all papers in Proc. ACL 2008 that mention extraction system.

See all papers in Proc. ACL that mention extraction system.

Back to top.