Fast and Accurate Shift-Reduce Constituent Parsing
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo

Article Structure

Abstract

Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art.

Introduction

Transition-based parsers employ a set of shift-reduce actions and perform parsing using a sequence of state transitions.

Baseline parser

We adopt the parser of Zhang and Clark (2009) for our baseline, which is based on the shift-reduce process of Sagae and Lavie (2005), and employs global perceptron training and beam search.

Improved hypotheses comparison

Unlike dependency parsing, constituent parse trees for the same sentence can have different numbers of nodes, mainly due to the existence of unary nodes.

Semi-supervised Parsing with Large Data

This section discusses how to extract information from unlabeled data or auto-parsed data to further improve shift-reduce parsing accuracies.

Experiments

5.1 Setup

Conclusion

In this paper, we addressed the problem of different action-sequence lengths for shift-reduce phrase-structure parsing, and designed a set of novel nonlocal features to further improve parsing.

Topics

semi-supervised

Appears in 27 sentences as: Semi-supervised (5) semi-supervised (23)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. In addition to the above contributions, we apply a variety of semi-supervised learning techniques to our transition-based parser.
    Page 2, “Introduction”
  2. Experimental results show that semi-supervised methods give a further improvement of 0.9% in F-score on the English data and 2.4% on the Chinese data.
    Page 2, “Introduction”
  3. 4.4 Semi-supervised Features
    Page 5, “Semi-supervised Parsing with Large Data”
  4. Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser.
    Page 6, “Experiments”
  5. Based on the extended parser, we experimented different types of semi-supervised features by adding the features incrementally.
    Page 6, “Experiments”
  6. By comparing the results in Table 5 and the results in Table 6 we can see that the semi-supervised features achieve an overall improvement of 1.0% on the English data and an im-
    Page 6, “Experiments”
  7. We grouped the parsers into three categories: single parsers (SI), discriminative reranking parsers (RE), and semi-supervised parsers (SE).
    Page 7, “Experiments”
  8. After integrating semi-supervised features, the parsing accuracy on English is improved to 91.3%.
    Page 7, “Experiments”
  9. The padding technique, supervised features, and semi-supervised features achieve an overall improvement of 1.4% over the baseline on English, which is significant on the level of p < 10‘5.
    Page 7, “Experiments”
  10. From the table, we can see that incorporating semi-supervised features decreases parsing speed, but the semi-supervised parser still has the advantage of efficiency over other parsers.
    Page 7, “Experiments”
  11. Specifically, the semi-supervised parser is 7 times faster than the Berkeley parser.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

shift-reduce

Appears in 17 sentences as: Shift-Reduce (1) Shift-reduce (3) shift-reduce (14)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art.
    Page 1, “Abstract”
  2. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input.
    Page 1, “Abstract”
  3. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search.
    Page 1, “Abstract”
  4. Transition-based parsers employ a set of shift-reduce actions and perform parsing using a sequence of state transitions.
    Page 1, “Introduction”
  5. We propose an extension to the shift-reduce process to address this problem, which gives significant improvements to the parsing accuracies.
    Page 2, “Introduction”
  6. We adopt the parser of Zhang and Clark (2009) for our baseline, which is based on the shift-reduce process of Sagae and Lavie (2005), and employs global perceptron training and beam search.
    Page 2, “Baseline parser”
  7. 2.1 Vanilla Shift-Reduce
    Page 2, “Baseline parser”
  8. Shift-reduce parsing is based on a left-to-right scan of the input sentence.
    Page 2, “Baseline parser”
  9. Figure 1: Deduction system of the baseline shift-reduce parsing process.
    Page 2, “Baseline parser”
  10. This section discusses how to extract information from unlabeled data or auto-parsed data to further improve shift-reduce parsing accuracies.
    Page 4, “Semi-supervised Parsing with Large Data”
  11. Based on the information, we propose a set of novel features specifically designed for shift-reduce constituent parsing.
    Page 4, “Semi-supervised Parsing with Large Data”

See all papers in Proc. ACL 2013 that mention shift-reduce.

See all papers in Proc. ACL that mention shift-reduce.

Back to top.

dependency parsing

Appears in 10 sentences as: dependency parsers (2) dependency parsing (8)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art.
    Page 1, “Abstract”
  2. Various methods have been proposed to address the disadvantages of greedy local parsing, among which a framework of beam-search and global discriminative training have been shown effective for dependency parsing (Zhang and Clark, 2008; Huang and Sagae, 2010).
    Page 1, “Introduction”
  3. With the use of rich nonlocal features, transition-based dependency parsers achieve state-of-the-art accuracies that are comparable to the best-graph-based parsers (Zhang and Nivre, 2011; Bohnet and Nivre, 2012).
    Page 1, “Introduction”
  4. In addition, processing tens of sentences per second (Zhang and Nivre, 2011), these transition-based parsers can be a favorable choice for dependency parsing .
    Page 1, “Introduction”
  5. However, the effects were not as significant as for transition-based dependency parsing .
    Page 1, “Introduction”
  6. One difference between phrase-structure parsing and dependency parsing is that for the former, parse trees with different numbers of unary rules require different numbers of actions to build.
    Page 1, “Introduction”
  7. Unlike dependency parsing , constituent parse trees for the same sentence can have different numbers of nodes, mainly due to the existence of unary nodes.
    Page 3, “Improved hypotheses comparison”
  8. Word clusters are regarded as lexical intermediaries for dependency parsing (Koo et al., 2008) and POS tagging (Sun and Uszkoreit, 2012).
    Page 4, “Semi-supervised Parsing with Large Data”
  9. The idea of exploiting lexical dependency information from auto-parsed data has been explored before for dependency parsing (Chen et al., 2009) and constituent parsing (Zhu et al., 2012).
    Page 4, “Semi-supervised Parsing with Large Data”
  10. (2008) and is used as additional information for graph-based dependency parsing in Chen et al.
    Page 5, “Semi-supervised Parsing with Large Data”

See all papers in Proc. ACL 2013 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

language model

Appears in 7 sentences as: Language Model (1) language model (3) language models (3)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. These relations are captured by word clustering, lexical dependencies, and a dependency language model , respectively.
    Page 4, “Semi-supervised Parsing with Large Data”
  2. 4.3 Structural Relations: Dependency Language Model
    Page 5, “Semi-supervised Parsing with Large Data”
  3. The dependency language model is proposed by Shen et al.
    Page 5, “Semi-supervised Parsing with Large Data”
  4. We build dependency language models on auto-parsed data.
    Page 5, “Semi-supervised Parsing with Large Data”
  5. From the dependency trees, we build a bigram and a trigram language model , which are denoted by BLM and TLM, respectively.
    Page 5, “Semi-supervised Parsing with Large Data”
  6. The following are the templates of the records of the dependency language models .
    Page 5, “Semi-supervised Parsing with Large Data”
  7. use the dependency language models , we employ a map function (19(7“) to assign a category to each record r according to its probability, as in Chen et al.
    Page 5, “Semi-supervised Parsing with Large Data”

See all papers in Proc. ACL 2013 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

Berkeley parser

Appears in 5 sentences as: Berkeley parser (6)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. On standard evaluations using both the Penn Treebank and the Penn Chinese Treebank, our parser gave higher accuracies than the Berkeley parser (Petrov and Klein, 2007), a state-of-the-art chart parser.
    Page 2, “Introduction”
  2. In addition, our parser runs with over 89 sentences per second, which is 14 times faster than the Berkeley parser , and is the fastest that we are aware of for phrase-structure parsing.
    Page 2, “Introduction”
  3. From the results we can see that our extended parser (baseline + padding + supervised features) outperforms the Berkeley parser by 0.3% on English, and is comparable with the Berkeley parser on Chinese (—0.1% less).
    Page 7, “Experiments”
  4. Specifically, the semi-supervised parser is 7 times faster than the Berkeley parser .
    Page 7, “Experiments”
  5. The resulting supervised parser outperforms the Berkeley parser , a state-of-the-art chart parser, in both accuracies and speeds.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention Berkeley parser.

See all papers in Proc. ACL that mention Berkeley parser.

Back to top.

bigram

Appears in 5 sentences as: bigram (4) bigrams (1)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. bigrams sowslw, sowslc, socslw, 300310, sowqow, sowqot, socqow, socqot, qo’wa’w, (1010(1175, (10731110, (Jotmt, slwqow, slwqot, slcqow, slcqot
    Page 3, “Baseline parser”
  2. 2 From the dependency trees, we extract bigram lexical dependencies (2121,2122, L/R) where the symbol L (R) means that w1 (2112) is the head of ’LU2 (wl).
    Page 4, “Semi-supervised Parsing with Large Data”
  3. (2009), we assign categories to bigram and trigram items separately according to their frequency counts.
    Page 5, “Semi-supervised Parsing with Large Data”
  4. Hereafter, we refer to the bigram and trigram lexical dependency lists as BLD and TLD, respectively.
    Page 5, “Semi-supervised Parsing with Large Data”
  5. From the dependency trees, we build a bigram and a trigram language model, which are denoted by BLM and TLM, respectively.
    Page 5, “Semi-supervised Parsing with Large Data”

See all papers in Proc. ACL 2013 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

constituent parsers

Appears in 5 sentences as: constituent parse (1) constituent parsers (2) constituent parsing (2)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art.
    Page 1, “Abstract”
  2. The best reported accuracies of transition-based constituent parsers still lag behind the state-of-the-art (Sagae and Lavie, 2006; Zhang and Clark, 2009).
    Page 1, “Introduction”
  3. Unlike dependency parsing, constituent parse trees for the same sentence can have different numbers of nodes, mainly due to the existence of unary nodes.
    Page 3, “Improved hypotheses comparison”
  4. Based on the information, we propose a set of novel features specifically designed for shift-reduce constituent parsing .
    Page 4, “Semi-supervised Parsing with Large Data”
  5. The idea of exploiting lexical dependency information from auto-parsed data has been explored before for dependency parsing (Chen et al., 2009) and constituent parsing (Zhu et al., 2012).
    Page 4, “Semi-supervised Parsing with Large Data”

See all papers in Proc. ACL 2013 that mention constituent parsers.

See all papers in Proc. ACL that mention constituent parsers.

Back to top.

dependency trees

Appears in 5 sentences as: dependency tree (1) dependency trees (4)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. To simplify the extraction process, we can convert auto-parsed constituency trees into dependency trees by using Penn2Malt.
    Page 4, “Semi-supervised Parsing with Large Data”
  2. 2 From the dependency trees , we extract bigram lexical dependencies (2121,2122, L/R) where the symbol L (R) means that w1 (2112) is the head of ’LU2 (wl).
    Page 4, “Semi-supervised Parsing with Large Data”
  3. Formally, given a dependency tree 3/ of an input sentence :5, we can denote by H the set of words that have at least one dependent.
    Page 5, “Semi-supervised Parsing with Large Data”
  4. Again, we convert constituency trees into dependency trees for the purpose of simplicity.
    Page 5, “Semi-supervised Parsing with Large Data”
  5. From the dependency trees , we build a bigram and a trigram language model, which are denoted by BLM and TLM, respectively.
    Page 5, “Semi-supervised Parsing with Large Data”

See all papers in Proc. ACL 2013 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

development sets

Appears in 5 sentences as: Development Sets (1) development sets (4)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. Table 5: Experimental results on the English and Chinese development sets with the padding technique and new supervised features added incrementally.
    Page 6, “Experiments”
  2. Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser.
    Page 6, “Experiments”
  3. on the development sets .
    Page 6, “Experiments”
  4. 5.2 Results on Development Sets
    Page 6, “Experiments”
  5. Table 5 reports the results of the extended parser (baseline + padding + supervised features) on the English and Chinese development sets .
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention development sets.

See all papers in Proc. ACL that mention development sets.

Back to top.

feature vector

Appears in 4 sentences as: feature vector (2) feature vectors (2)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. Here (I) (ai) represents the feature vector for the it}, action a, in state item 04.
    Page 3, “Baseline parser”
  2. The significant variance in the number of actions N can have an impact on the linear separability of state items, for which the feature vectors are 21-111 (I) (ai).
    Page 3, “Improved hypotheses comparison”
  3. A feature vector is extracted for the IDLE action according to the final state context, in the same way as other actions.
    Page 3, “Improved hypotheses comparison”
  4. corresponding feature vectors have about the same sizes, and are more linearly separable.
    Page 4, “Improved hypotheses comparison”

See all papers in Proc. ACL 2013 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

parse trees

Appears in 4 sentences as: parse tree (1) parse trees (3)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. The pioneering models rely on a classifier to make local decisions, and search greedily for a transition sequence to build a parse tree .
    Page 1, “Introduction”
  2. One difference between phrase-structure parsing and dependency parsing is that for the former, parse trees with different numbers of unary rules require different numbers of actions to build.
    Page 1, “Introduction”
  3. Unlike dependency parsing, constituent parse trees for the same sentence can have different numbers of nodes, mainly due to the existence of unary nodes.
    Page 3, “Improved hypotheses comparison”
  4. Figure 2: Example parse trees of the same sentence with different numbers of actions.
    Page 3, “Improved hypotheses comparison”

See all papers in Proc. ACL 2013 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

perceptron

Appears in 4 sentences as: perceptron (4)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. We adopt the parser of Zhang and Clark (2009) for our baseline, which is based on the shift-reduce process of Sagae and Lavie (2005), and employs global perceptron training and beam search.
    Page 2, “Baseline parser”
  2. The model parameter 5 is trained with the averaged perceptron algorithm, applied to state items (sequence of actions) globally.
    Page 3, “Baseline parser”
  3. This turns out to have a significant empirical influence on perceptron training with early-update, where the training of the model interacts with search (Daume III, 2006).
    Page 3, “Improved hypotheses comparison”
  4. The optimal iteration number of perceptron learning is determined
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention perceptron.

See all papers in Proc. ACL that mention perceptron.

Back to top.

POS tagging

Appears in 4 sentences as: POS tagger (1) POS tagging (3) POS tags (1)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. Word clusters are regarded as lexical intermediaries for dependency parsing (Koo et al., 2008) and POS tagging (Sun and Uszkoreit, 2012).
    Page 4, “Semi-supervised Parsing with Large Data”
  2. For both English and Chinese data, we used tenfold jackknifing (Collins, 2000) to automatically assign POS tags to the training data.
    Page 5, “Experiments”
  3. For English POS tagging, we adopted SVMTool, 3 and for Chinese POS tagging
    Page 5, “Experiments”
  4. we employed the Stanford POS tagger .
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

Treebank

Appears in 4 sentences as: Treebank (5)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. On standard evaluations using both the Penn Treebank and the Penn Chinese Treebank , our parser gave higher accuracies than the Berkeley parser (Petrov and Klein, 2007), a state-of-the-art chart parser.
    Page 2, “Introduction”
  2. Labeled English data employed in this paper were derived from the Wall Street Journal (WSJ) corpus of the Penn Treebank (Marcus et al., 1993).
    Page 5, “Experiments”
  3. For labeled Chinese data, we used the version 5 .1 of the Penn Chinese Treebank (CTB) (Xue et al., 2005).
    Page 5, “Experiments”
  4. In addition, we removed from the unlabeled English data the sentences that appear in the WSJ corpus of the Penn Treebank .
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

Penn Treebank

Appears in 3 sentences as: Penn Treebank (3)
In Fast and Accurate Shift-Reduce Constituent Parsing
  1. On standard evaluations using both the Penn Treebank and the Penn Chinese Treebank, our parser gave higher accuracies than the Berkeley parser (Petrov and Klein, 2007), a state-of-the-art chart parser.
    Page 2, “Introduction”
  2. Labeled English data employed in this paper were derived from the Wall Street Journal (WSJ) corpus of the Penn Treebank (Marcus et al., 1993).
    Page 5, “Experiments”
  3. In addition, we removed from the unlabeled English data the sentences that appear in the WSJ corpus of the Penn Treebank .
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.