Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
Wang, Zhiguo and Xue, Nianwen

Article Structure

Abstract

We propose three improvements to address the drawbacks of state-of-the-art transition-based constituent parsers.

Introduction

Constituent parsing is one of the most fundamental tasks in Natural Language Processing (NLP).

Transition-based Constituent Parsing

This section describes the transition-based constituent parsing model, which is the basis of Section 3 and the baseline model in Section 4.

Joint POS Tagging and Parsing with Nonlocal Features

To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features.

Experiment

4.1 Experimental Setting

Related Work

Joint POS tagging with parsing is not a new idea.

Topics

POS tagging

Appears in 37 sentences as: POS tag (6) POS Tagger (1) POS Tagging (2) POS tagging (24) POS tags (12)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. First, to resolve the error propagation problem of the traditional pipeline approach, we incorporate POS tagging into the syntactic parsing process.
    Page 1, “Abstract”
  2. First, POS tagging is typically performed separately as a preliminary step, and POS tagging errors will propagate to the parsing process.
    Page 1, “Introduction”
  3. This problem is especially severe for languages where the POS tagging accuracy is relatively low, and this is the case for Chinese where there are fewer contextual clues that can be used to inform the tagging process and some of the tagging decisions are actually influenced by the syntactic structure of the sentence.
    Page 1, “Introduction”
  4. First, we integrate POS tagging into the parsing process and jointly optimize these two processes simultaneously.
    Page 1, “Introduction”
  5. determination, the accuracy of POS tagging improves, and this will in turn improve parsing accuracy.
    Page 2, “Introduction”
  6. Figure 1: Two constituent trees for an example sentence wowlwg with POS tags abc.
    Page 2, “Transition-based Constituent Parsing”
  7. For example, in Figure l, for the input sentence wowlwg and its POS tags abc, our parser can construct two parse trees using action sequences given below these trees.
    Page 2, “Transition-based Constituent Parsing”
  8. To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features.
    Page 3, “Joint POS Tagging and Parsing with Nonlocal Features”
  9. 3.1 Joint POS Tagging and Parsing
    Page 3, “Joint POS Tagging and Parsing with Nonlocal Features”
  10. POS tagging is often taken as a preliminary step for transition-based constituent parsing, therefore the accuracy of POS tagging would greatly affect parsing performance.
    Page 3, “Joint POS Tagging and Parsing with Nonlocal Features”
  11. In our experiment (described in Section 4.2), parsing accuracy would decrease by 8.5% in F1 in Chinese parsing when using automatically generated POS tags instead of gold-standard ones.
    Page 3, “Joint POS Tagging and Parsing with Nonlocal Features”

See all papers in Proc. ACL 2014 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

constituent parsing

Appears in 30 sentences as: constituent parse (3) constituent parser (4) constituent parsers (2) Constituent Parsing (2) Constituent parsing (1) constituent parsing (19)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. We propose three improvements to address the drawbacks of state-of-the-art transition-based constituent parsers .
    Page 1, “Abstract”
  2. Constituent parsing is one of the most fundamental tasks in Natural Language Processing (NLP).
    Page 1, “Introduction”
  3. Transition-based constituent parsing (Sagae and Lavie, 2005; Wang et al., 2006; Zhang and Clark, 2009) is an attractive alternative.
    Page 1, “Introduction”
  4. However, there is still room for improvement for these state-of-the-art transition-based constituent parsers .
    Page 1, “Introduction”
  5. In this paper, we address these drawbacks to improve the transition-based constituent parsing for Chinese.
    Page 1, “Introduction”
  6. The remainder of this paper is organized as follows: Section 2 introduces the standard transition-based constituent parsing approach.
    Page 2, “Introduction”
  7. Section 3 describes our three improvements to standard transition-based constituent parsing .
    Page 2, “Introduction”
  8. This section describes the transition-based constituent parsing model, which is the basis of Section 3 and the baseline model in Section 4.
    Page 2, “Transition-based Constituent Parsing”
  9. 2.1 Transition-based Constituent Parsing Model
    Page 2, “Transition-based Constituent Parsing”
  10. A transition-based constituent parsing model is a quadruple C = (S, T, 30, St), where S is a set of parser states (sometimes called configurations), T is a finite set of actions, so is an initialization function to map each input sentence into a unique initial state, and St E S is a set of terminal states.
    Page 2, “Transition-based Constituent Parsing”
  11. The task of transition-based constituent parsing is to scan the input POS-tagged sentence from left to right and perform a sequence of actions to transform the initial state into a terminal state.
    Page 2, “Transition-based Constituent Parsing”

See all papers in Proc. ACL 2014 that mention constituent parsing.

See all papers in Proc. ACL that mention constituent parsing.

Back to top.

semi-supervised

Appears in 15 sentences as: Semi-supervised (2) semi-supervised (14)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features.
    Page 1, “Abstract”
  2. Third, we take into account two groups of complex structural features that have not been previously used in transition-based parsing: nonlocal features (Charniak and Johnson, 2005) and semi-supervised word cluster features (Koo et al., 2008).
    Page 2, “Introduction”
  3. After integrating semi-supervised word cluster features, the parsing accuracy is further improved to 86.3% when trained on CTB 5.1 and 87.1% when trained on CTB 6.0, and this is the best reported performance for Chinese.
    Page 2, “Introduction”
  4. To further improve the performance of our transition-based constituent parser, we consider two group of complex structural features: nonlocal features (Chamiak and Johnson, 2005; Collins and Koo, 2005) and semi-supervised word cluster features (Koo et al., 2008).
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  5. Semi-supervised word cluster features have been successfully applied to many NLP tasks (Miller et al., 2004; Koo et al., 2008; Zhu et al., 2013).
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  6. Using these two types of clusters, we construct semi-supervised word cluster features by mimicking the template structure of the original baseline features in Table 1.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  7. In this subsection, we examined the usefulness of the new nonlocal features and the semi-supervised word cluster features described in Subsection 3.3.
    Page 6, “Experiment”
  8. We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features.
    Page 6, “Experiment”
  9. Compared with the StateAlign system which takes only the baseline features, the nonlocal features improved parsing F1 by 0.8%, while the semi-supervised word cluster features result in an improvement of 2.3% in parsing F1 and an 1.1% improvement on POS tagging accuracy.
    Page 6, “Experiment”
  10. Semi-supervised Systems Zhu et al.
    Page 7, “Experiment”
  11. 1 these results show that both the nonlocal features and the semi-supervised features are helpful for our transition-based constituent parser.
    Page 7, “Experiment”

See all papers in Proc. ACL 2014 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

parsing model

Appears in 11 sentences as: Parsing Model (1) parsing model (9) parsing models (1)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Third, to enhance the power of parsing models , we enlarge the feature set with nonlocal features and semi-supervised word cluster features.
    Page 1, “Abstract”
  2. This creates a chicken and egg problem that needs to be addressed when designing a parsing model .
    Page 1, “Introduction”
  3. Second, due to the existence of unary rules in constituent trees, competing candidate parses often have different number of actions, and this increases the disambiguation difficulty for the parsing model .
    Page 1, “Introduction”
  4. With this strategy, parser states and their unary extensions are put into the same beam, therefore the parsing model could decide whether or not to use unary actions within local decision beams.
    Page 2, “Introduction”
  5. This section describes the transition-based constituent parsing model , which is the basis of Section 3 and the baseline model in Section 4.
    Page 2, “Transition-based Constituent Parsing”
  6. 2.1 Transition-based Constituent Parsing Model
    Page 2, “Transition-based Constituent Parsing”
  7. A transition-based constituent parsing model is a quadruple C = (S, T, 30, St), where S is a set of parser states (sometimes called configurations), T is a finite set of actions, so is an initialization function to map each input sentence into a unique initial state, and St E S is a set of terminal states.
    Page 2, “Transition-based Constituent Parsing”
  8. To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features.
    Page 3, “Joint POS Tagging and Parsing with Nonlocal Features”
  9. This makes the lengths of complete action sequences very different, and the parsing model has to disambiguate among terminal states with varying action sizes.
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  10. We find that our new method aligns states with their ru—x extensions in the same beam, therefore the parsing model could make decisions on whether using ru—x actions or not within local decision
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  11. Finally, we enhanced our parsing model by enlarging the feature set with nonlocal features and semi-supervised word cluster features.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

parse tree

Appears in 9 sentences as: parse tree (7) parse trees (3)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. empty stack 0 and a queue 6 containing the entire input sentence (word-POS pairs), and the terminal states have an empty queue 6 and a stack 0 containing only one complete parse tree .
    Page 2, “Transition-based Constituent Parsing”
  2. In order to construct lexicalized constituent parse trees , we define the following actions for the action set T according to (Sagae and Lavie, 2005; Wang et al., 2006; Zhang and Clark, 2009):
    Page 2, “Transition-based Constituent Parsing”
  3. For example, in Figure l, for the input sentence wowlwg and its POS tags abc, our parser can construct two parse trees using action sequences given below these trees.
    Page 2, “Transition-based Constituent Parsing”
  4. However, parse trees in Treebanks often contain an arbitrary number of branches.
    Page 2, “Transition-based Constituent Parsing”
  5. Input: A POS-tagged sentence, beam size k. Output: A constituent parse tree .
    Page 3, “Transition-based Constituent Parsing”
  6. Assuming an input sentence contains n words, in order to reach a terminal state, the initial state requires n sh—x actions to consume all words in 6, and n — l rl/rr—x actions to construct a complete parse tree by consuming all the subtrees in 0.
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  7. For example, the parse tree in Figure la contains no ru—x action, while the parse tree for the same input sentence in Figure lb contains four ru—x actions.
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  8. Input: A word-segmented sentence, beam size k. Output: A constituent parse tree .
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  9. One difficulty is that the subtrees built by our baseline parser are binary trees (only the complete parse tree is debinarized into its original multi-branch form), but most of the nonlocal features need to be extracted from their original multi-branch forms.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”

See all papers in Proc. ACL 2014 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

structural features

Appears in 8 sentences as: structural features (9)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Therefore, it runs in linear time and can take advantage of arbitrarily complex structural features from already constructed subtrees.
    Page 1, “Introduction”
  2. Third, transition-based parsers have the freedom to define arbitrarily complex structural features, but this freedom has not fully been taken advantage of and most of the present approaches only use simple structural features .
    Page 1, “Introduction”
  3. Third, we take into account two groups of complex structural features that have not been previously used in transition-based parsing: nonlocal features (Charniak and Johnson, 2005) and semi-supervised word cluster features (Koo et al., 2008).
    Page 2, “Introduction”
  4. One advantage of transition-based constituent parsing is that it is capable of incorporating arbitrarily complex structural features from the already constructed subtrees in 0 and unprocessed words in 6.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  5. However, all the feature templates given in Table l are just some simple structural features .
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  6. To further improve the performance of our transition-based constituent parser, we consider two group of complex structural features : nonlocal features (Chamiak and Johnson, 2005; Collins and Koo, 2005) and semi-supervised word cluster features (Koo et al., 2008).
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  7. The reason is that the single-stage chart-based parser cannot use nonlocal structural features .
    Page 8, “Related Work”
  8. In contrast, the transition-based parser can use arbitrarily complex structural features .
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention structural features.

See all papers in Proc. ACL that mention structural features.

Back to top.

subtrees

Appears in 8 sentences as: subtrees (9)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Therefore, it runs in linear time and can take advantage of arbitrarily complex structural features from already constructed subtrees .
    Page 1, “Introduction”
  2. A parser state 8 E S is defined as a tuple s = (o, 6), where o is a stack which is maintained to hold partial subtrees that are already constructed, and 6 is a queue which is used for storing word-POS pairs that remain unprocessed.
    Page 2, “Transition-based Constituent Parsing”
  3. 0 REDUCE-BINARY—{L/R}-X (rl/rr—X): pop the top two subtrees from 0, combine them into a new tree with a node labeled with X, then push the new subtree back onto 0.
    Page 2, “Transition-based Constituent Parsing”
  4. Assuming an input sentence contains n words, in order to reach a terminal state, the initial state requires n sh—x actions to consume all words in 6, and n — l rl/rr—x actions to construct a complete parse tree by consuming all the subtrees in 0.
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  5. One advantage of transition-based constituent parsing is that it is capable of incorporating arbitrarily complex structural features from the already constructed subtrees in 0 and unprocessed words in 6.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  6. Instead, we attempt to extract nonlocal features from newly constructed subtrees during the decoding process as they become incrementally available and score newly generated parser states with them.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  7. One difficulty is that the subtrees built by our baseline parser are binary trees (only the complete parse tree is debinarized into its original multi-branch form), but most of the nonlocal features need to be extracted from their original multi-branch forms.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  8. The other subtrees for subsequent parsing steps will be built based on these debinarized subtrees .
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”

See all papers in Proc. ACL 2014 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

development set

Appears in 6 sentences as: development set (6)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set , and Articles 271-300 were used
    Page 5, “Experiment”
  2. We tuned the optimal number of iterations of perceptron training algorithm on the development set .
    Page 6, “Experiment”
  3. We trained these three systems on the training set and evaluated them on the development set .
    Page 6, “Experiment”
  4. Table 3: Parsing performance on Chinese development set .
    Page 6, “Experiment”
  5. We can see that all these systems maintain a similar relative relationship as they do on the development set , which shows the stability of our systems.
    Page 7, “Experiment”
  6. We used the same development set and test set as CTB5, and took all the remaining data as the new training set.
    Page 7, “Experiment”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

reranking

Appears in 5 sentences as: reranker (1) reranking (4)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. But almost all previous work considered nonlocal features only in parse reranking frameworks.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  2. We group parsing systems into three categories: single systems, reranking systems and semi-supervised systems.
    Page 7, “Experiment”
  3. Our N0nlocal&Cluster system further improved the parsing F1 to 86.3%, and it outperforms all reranking systems and semi-supervised systems.
    Page 7, “Experiment”
  4. *Huang (2009) adapted the parse reranker to CTB5.
    Page 7, “Experiment”
  5. However, almost all of the previous work use nonlocal features at the parse reranking stage.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention reranking.

See all papers in Proc. ACL that mention reranking.

Back to top.

Bigrams

Appears in 4 sentences as: Bigrams (4)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. (Collins and Koo, 2005) (Charniak and Johnson, 2005) Rules CoPar HeadTree Bigrams CoLenPar
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  2. Grandparent Bigrams Heavy
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  3. Lexical Bigrams Neighbours
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”
  4. Two-level Bigrams Heads
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”

See all papers in Proc. ACL 2014 that mention Bigrams.

See all papers in Proc. ACL that mention Bigrams.

Back to top.

feature templates

Appears in 4 sentences as: Feature Templates (1) feature templates (3)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Type Feature Templates
    Page 3, “Transition-based Constituent Parsing”
  2. Table 1 lists the feature templates used in our baseline parser, which is adopted from Zhang and Clark (2009).
    Page 3, “Transition-based Constituent Parsing”
  3. However, some feature templates in Table 1 become unavailable, because POS tags for the lookahead words are not specified yet under the joint framework.
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  4. However, all the feature templates given in Table l are just some simple structural features.
    Page 5, “Joint POS Tagging and Parsing with Nonlocal Features”

See all papers in Proc. ACL 2014 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

beam size

Appears in 3 sentences as: beam size (3)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Input: A POS-tagged sentence, beam size k. Output: A constituent parse tree.
    Page 3, “Transition-based Constituent Parsing”
  2. Input: A word-segmented sentence, beam size k. Output: A constituent parse tree.
    Page 4, “Joint POS Tagging and Parsing with Nonlocal Features”
  3. We set the beam size k to 16, which brings a good balance between efficiency and accuracy.
    Page 6, “Experiment”

See all papers in Proc. ACL 2014 that mention beam size.

See all papers in Proc. ACL that mention beam size.

Back to top.

F1 score

Appears in 3 sentences as: F1 score (3)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. We can see that the parsing F1 decreased by about 8.5 percentage points in F1 score when using automatically assigned POS tags instead of gold-standard ones, and this shows that the pipeline approach is greatly affected by the quality of its preliminary POS tagging step.
    Page 6, “Experiment”
  2. Compared with the JointParsing system which does not employ any alignment strategy, the Padding system only achieved a slight improvement on parsing F1 score , but no improvement on POS tagging accuracy.
    Page 6, “Experiment”
  3. In contrast, our StateAlign system achieved an improvement of 0.6% on parsing F1 score and 0.4% on POS tagging accuracy.
    Page 6, “Experiment”

See all papers in Proc. ACL 2014 that mention F1 score.

See all papers in Proc. ACL that mention F1 score.

Back to top.

feature set

Appears in 3 sentences as: feature set (5)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features.
    Page 1, “Abstract”
  2. We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features.
    Page 6, “Experiment”
  3. Finally, we enhanced our parsing model by enlarging the feature set with nonlocal features and semi-supervised word cluster features.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

gold-standard

Appears in 3 sentences as: gold-standard (3)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. In our experiment (described in Section 4.2), parsing accuracy would decrease by 8.5% in F1 in Chinese parsing when using automatically generated POS tags instead of gold-standard ones.
    Page 3, “Joint POS Tagging and Parsing with Nonlocal Features”
  2. We built three parsing systems: Pipeline-Gold system is our baseline parser (described in Section 2) taking gold-standard POS tags as input; Pipeline system is our baseline parser taking as input POS tags automatically assigned by Stanford POS Tagger 3; and JointParsing system is our joint POS tagging and transition-based parsing system described in subsection 3.1.
    Page 6, “Experiment”
  3. We can see that the parsing F1 decreased by about 8.5 percentage points in F1 score when using automatically assigned POS tags instead of gold-standard ones, and this shows that the pipeline approach is greatly affected by the quality of its preliminary POS tagging step.
    Page 6, “Experiment”

See all papers in Proc. ACL 2014 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

TreeBank

Appears in 3 sentences as: TreeBank (1) Treebank (1) Treebanks (1)
In Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
  1. However, parse trees in Treebanks often contain an arbitrary number of branches.
    Page 2, “Transition-based Constituent Parsing”
  2. We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set, and Articles 271-300 were used
    Page 5, “Experiment”
  3. To check whether more labeled data can further improve our parsing system, we evaluated our N0nlocal&Cluster system on the Chinese TreeBank version 6.0 (CTB6), which is a super set of CTB5 and contains more annotated data.
    Page 7, “Experiment”

See all papers in Proc. ACL 2014 that mention TreeBank.

See all papers in Proc. ACL that mention TreeBank.

Back to top.