Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi

Article Structure

Abstract

We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese.

Introduction

In processing natural languages that do not include delimiters (e.g.

Related Works

In Chinese, Luo (2003) proposed a joint constituency parser that performs segmentation, POS tagging, and parsing within a single character-based framework.

Model

3.1 Incremental Joint Segmentation, POS Tagging, and Dependency Parsing

Topics

POS tagging

Appears in 46 sentences as: POS tag (5) POS Tagging (1) POS tagging (37) POS tags (7)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. We propose the first joint model for word segmentation, POS tagging , and dependency parsing for Chinese.
    Page 1, “Abstract”
  2. Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging , and dependency parsing models.
    Page 1, “Abstract”
  3. In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing.
    Page 1, “Abstract”
  4. Furthermore, the word-level information is often augmented with the POS tags , which, along with segmentation, form the basic foundation of statistical NLP.
    Page 1, “Introduction”
  5. Because the tasks of word segmentation and POS tagging have strong interactions, many studies have been devoted to the task of joint word segmentation and POS tagging for languages such as Chinese (e.g.
    Page 1, “Introduction”
  6. This is because some of the segmentation ambiguities cannot be resolved without considering the surrounding grammatical constructions encoded in a sequence of POS tags .
    Page 1, “Introduction”
  7. The joint approach to word segmentation and POS tagging has been reported to improve word segmentation and POS tagging accuracies by more than
    Page 1, “Introduction”
  8. In addition, some researchers recently proposed a joint approach to Chinese POS tagging and dependency parsing (Li et al., 2011; Hatori et al., 2011); particularly, Hatori et al.
    Page 1, “Introduction”
  9. Based on these observations, we aim at building a joint model that simultaneously processes word segmentation, POS tagging , and dependency parsing, trying to capture global interaction among
    Page 1, “Introduction”
  10. We perform experiments using the Chinese Treebank (CTB) corpora, demonstrating that the accuracies of the three tasks can be improved significantly over the pipeline combination of the state-of-the-art joint segmentation and POS tagging model, and the dependency parser.
    Page 2, “Introduction”
  11. In Chinese, Luo (2003) proposed a joint constituency parser that performs segmentation, POS tagging , and parsing within a single character-based framework.
    Page 2, “Related Works”

See all papers in Proc. ACL 2012 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

dependency parsing

Appears in 30 sentences as: dependency parser (5) Dependency Parsing (1) dependency parsing (27)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese.
    Page 1, “Abstract”
  2. Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models.
    Page 1, “Abstract”
  3. In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing .
    Page 1, “Abstract”
  4. In addition, some researchers recently proposed a joint approach to Chinese POS tagging and dependency parsing (Li et al., 2011; Hatori et al., 2011); particularly, Hatori et al.
    Page 1, “Introduction”
  5. In this context, it is natural to consider further a question regarding the joint framework: how strongly do the tasks of word segmentation and dependency parsing interact?
    Page 1, “Introduction”
  6. Based on these observations, we aim at building a joint model that simultaneously processes word segmentation, POS tagging, and dependency parsing , trying to capture global interaction among
    Page 1, “Introduction”
  7. Two major challenges exist in formalizing the joint segmentation and dependency parsing task in the character-based incremental framework.
    Page 2, “Introduction”
  8. We have also found that we must balance the learning rate between features for segmentation and tagging decisions, and those for dependency parsing .
    Page 2, “Introduction”
  9. We perform experiments using the Chinese Treebank (CTB) corpora, demonstrating that the accuracies of the three tasks can be improved significantly over the pipeline combination of the state-of-the-art joint segmentation and POS tagging model, and the dependency parser .
    Page 2, “Introduction”
  10. Therefore, we place no restriction on the segmentation possibilities to consider, and we assess the full potential of the joint segmentation and dependency parsing model.
    Page 2, “Related Works”
  11. The incremental framework of our model is based on the joint POS tagging and dependency parsing model for Chinese (Hatori et al., 2011), which is an extension of the shift-reduce dependency parser with dynamic programming (Huang and Sagae, 2010).
    Page 2, “Related Works”

See all papers in Proc. ACL 2012 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

joint model

Appears in 21 sentences as: joint model (16) joint models (5)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese.
    Page 1, “Abstract”
  2. Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models.
    Page 1, “Abstract”
  3. We also perform comparison experiments with the partially joint models .
    Page 1, “Abstract”
  4. Based on these observations, we aim at building a joint model that simultaneously processes word segmentation, POS tagging, and dependency parsing, trying to capture global interaction among
    Page 1, “Introduction”
  5. We also perform comparison experiments with partially joint models , and investigate the tradeoff between the running speed and the model performance.
    Page 2, “Introduction”
  6. In contrast, we built a joint model based on a dependency-based framework, with a rich set of structural features.
    Page 2, “Related Works”
  7. Because we found that even an incremental approach with beam search is intractable if we perform the word-based decoding, we take a character-based approach to produce our joint model .
    Page 2, “Related Works”
  8. (2011), we build our joint model to solve word segmentation, POS tagging, and dependency parsing within a single framework.
    Page 3, “Model”
  9. In our joint model , the early update is invoked by mistakes in any of word segmentation, POS tagging, or dependency parsing.
    Page 3, “Model”
  10. The list of the features used in our joint model is presented in Table 1, where $01—$05, W01—W21, and T01—05 are taken from Zhang and Clark (2010), and P01—P28 are taken from Huang and Sagae (2010).
    Page 4, “Model”
  11. Because segmentation using a dictionary alone can serve as a strong baseline in Chinese word segmentation (Sproat et al., 1996), the use of dictionaries is expected to make our joint model more robust and enables us to investigate the contribution of the syntactic dependency in a more realistic setting.
    Page 4, “Model”

See all papers in Proc. ACL 2012 that mention joint model.

See all papers in Proc. ACL that mention joint model.

Back to top.

word segmentation

Appears in 17 sentences as: word segmentation (19)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. We propose the first joint model for word segmentation , POS tagging, and dependency parsing for Chinese.
    Page 1, “Abstract”
  2. spaces) between words, word segmentation is the crucial first step that is necessary to perform Virtually all NLP tasks.
    Page 1, “Introduction”
  3. Because the tasks of word segmentation and POS tagging have strong interactions, many studies have been devoted to the task of joint word segmentation and POS tagging for languages such as Chinese (e.g.
    Page 1, “Introduction”
  4. The joint approach to word segmentation and POS tagging has been reported to improve word segmentation and POS tagging accuracies by more than
    Page 1, “Introduction”
  5. In this context, it is natural to consider further a question regarding the joint framework: how strongly do the tasks of word segmentation and dependency parsing interact?
    Page 1, “Introduction”
  6. This is an example in which word segmentation cannot be handled properly without considering long-range syntactic information.
    Page 1, “Introduction”
  7. Based on these observations, we aim at building a joint model that simultaneously processes word segmentation , POS tagging, and dependency parsing, trying to capture global interaction among
    Page 1, “Introduction”
  8. In addition, the lattice does not include word segmentation ambiguities crossing boundaries of space-delimited tokens.
    Page 2, “Related Works”
  9. However, because they regarded word segmentation as given, their model did not consider the
    Page 2, “Related Works”
  10. (2011), we build our joint model to solve word segmentation , POS tagging, and dependency parsing within a single framework.
    Page 3, “Model”
  11. In our joint model, the early update is invoked by mistakes in any of word segmentation , POS tagging, or dependency parsing.
    Page 3, “Model”

See all papers in Proc. ACL 2012 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

beam size

Appears in 8 sentences as: beam size (9)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. We have some parameters to tune: parsing feature weight 0p, beam size , and training epoch.
    Page 6, “Model”
  2. In this experiment, the external dictionaries are not used, and the beam size of 32 is used.
    Page 6, “Model”
  3. Table 3 shows the performance and speed of the full joint model (with no dictionaries) on CTB-Sc-l with respect to the beam size .
    Page 6, “Model”
  4. Although even the beam size of 32 results in competitive accuracies for word segmentation and POS tagging, the dependency accuracy is affected most by the increase of the beam size .
    Page 6, “Model”
  5. Based on this experiment, we set the beam size of SegTagDep to 64 throughout the exper-
    Page 6, “Model”
  6. the beam size .
    Page 7, “Model”
  7. Each point corresponds to the beam size of 4, 8, 16, 32, (64).
    Page 7, “Model”
  8. The beam size of 16 is used for SegTag in SegTag+Dep and SegTag+TagDep.
    Page 7, “Model”

See all papers in Proc. ACL 2012 that mention beam size.

See all papers in Proc. ACL that mention beam size.

Back to top.

parsing model

Appears in 5 sentences as: parsing model (4) parsing models (1)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models .
    Page 1, “Abstract”
  2. Therefore, we place no restriction on the segmentation possibilities to consider, and we assess the full potential of the joint segmentation and dependency parsing model .
    Page 2, “Related Works”
  3. The incremental framework of our model is based on the joint POS tagging and dependency parsing model for Chinese (Hatori et al., 2011), which is an extension of the shift-reduce dependency parser with dynamic programming (Huang and Sagae, 2010).
    Page 2, “Related Works”
  4. Based on the joint POS tagging and dependency parsing model by Hatori et al.
    Page 3, “Model”
  5. 0 TagDep: the joint POS tagging and dependency parsing model (Hatori et al., 2011), where the lookahead features are omitted.5
    Page 6, “Model”

See all papers in Proc. ACL 2012 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

beam search

Appears in 4 sentences as: beam search (5)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. However, it requires to computationally expensive multiple beams to compare words of different lengths using beam search .
    Page 2, “Related Works”
  2. Because we found that even an incremental approach with beam search is intractable if we perform the word-based decoding, we take a character-based approach to produce our joint model.
    Page 2, “Related Works”
  3. In beam search, we use the step index that is associated with each state: the parser states in process are aligned according to the index, and the beam search pruning is applied to those states with the same index.
    Page 3, “Model”
  4. Consequently, for the beam search to function effectively, all states with the same index must be comparable, and all terminal states should have the same step index.
    Page 3, “Model”

See all papers in Proc. ACL 2012 that mention beam search.

See all papers in Proc. ACL that mention beam search.

Back to top.

F1 scores

Appears in 4 sentences as: F1 score (1) F1 scores (4)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. We use standard measures of word-level precision, recall, and F1 score , for evaluating each task.
    Page 6, “Model”
  2. Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights, where “Seg”, “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing.
    Page 6, “Model”
  3. Table 3: F1 scores and speed (in sentences per sec.)
    Page 7, “Model”
  4. Table 4 shows the segmentation, POS tagging, and dependency parsing F1 scores of these models on CTB-5c.
    Page 7, “Model”

See all papers in Proc. ACL 2012 that mention F1 scores.

See all papers in Proc. ACL that mention F1 scores.

Back to top.

feature set

Appears in 4 sentences as: feature set (3) feature sets (1)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. Second, although the feature set is fundamentally a combination of those used in previous works (Zhang and Clark, 2010; Huang and Sagae, 2010), to integrate them in a single incremental framework is not straightforward.
    Page 2, “Introduction”
  2. Zhang and Clark (2008) proposed an incremental joint segmentation and POS tagging model, with an effective feature set for Chinese.
    Page 2, “Related Works”
  3. The feature set of our model is fundamentally a combination of the features used in the state-of-the-art joint segmentation and POS tagging model (Zhang and Clark, 2010) and dependency parser (Huang and Sagae, 2010), both of which are used as baseline models in our experiment.
    Page 4, “Model”
  4. All of the models described above except Dep’ are based on the same feature sets for segmentation and
    Page 6, “Model”

See all papers in Proc. ACL 2012 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

proposed model

Appears in 4 sentences as: proposed model (2) Proposed Models (1) proposed models (1)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. 4.2 Baseline and Proposed Models
    Page 6, “Model”
  2. We use the following baseline and proposed models for evaluation.
    Page 6, “Model”
  3. Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights, where “Seg”, “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing.
    Page 6, “Model”
  4. In contrast, this negative effect is not observed for SegTagDep: both the overall tagging accuracy and the 00V accuracy are improved, demonstrating the effectiveness of the proposed model .
    Page 7, “Model”

See all papers in Proc. ACL 2012 that mention proposed model.

See all papers in Proc. ACL that mention proposed model.

Back to top.

Seg

Appears in 4 sentences as: Seg (4) “Seg” (1)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights, where “Seg” , “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing.
    Page 6, “Model”
  2. Beam Seg Tag Dep Speed
    Page 7, “Model”
  3. System Seg Tag Kruengkrai ’09 97.87 93.67 Zhang ’10 97.78 93.67 Sun ’11 98.17 94.02 Wang ’11 98.11 94.18 SegTag 97.66 93.61 SegTagDep 97.73 94.46 SegTag(d) 98.18 94.08 SegTagDep(d) 98.26 94.64
    Page 7, “Model”
  4. MOdel Seg Tag Dep Seg Tag Dep Kruengkrai ’09 95.50 90.50 — 95.40 89.86 —Wang ’11 95.79 91.12 — 95.65 90.46 —
    Page 8, “Model”

See all papers in Proc. ACL 2012 that mention Seg.

See all papers in Proc. ACL that mention Seg.

Back to top.

feature weights

Appears in 3 sentences as: feature weight (1) feature weights (2)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. the training epoch (x-axis) and parsing feature weights (in legend).
    Page 6, “Model”
  2. We have some parameters to tune: parsing feature weight 0p, beam size, and training epoch.
    Page 6, “Model”
  3. Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights , where “Seg”, “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing.
    Page 6, “Model”

See all papers in Proc. ACL 2012 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.

Treebank

Appears in 3 sentences as: Treebank (3)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing.
    Page 1, “Abstract”
  2. We perform experiments using the Chinese Treebank (CTB) corpora, demonstrating that the accuracies of the three tasks can be improved significantly over the pipeline combination of the state-of-the-art joint segmentation and POS tagging model, and the dependency parser.
    Page 2, “Introduction”
  3. ’e use the Chinese Penn Treebank ver.
    Page 5, “Model”

See all papers in Proc. ACL 2012 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

word-level

Appears in 3 sentences as: word-level (3)
In Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
  1. Furthermore, the word-level information is often augmented with the POS tags, which, along with segmentation, form the basic foundation of statistical NLP.
    Page 1, “Introduction”
  2. To incorporate the word-level features into the character-based decoder, the features are decomposed into substring-level features, which are effective for incomplete words to have comparable scores to complete words in the beam.
    Page 2, “Related Works”
  3. We use standard measures of word-level precision, recall, and F1 score, for evaluating each task.
    Page 6, “Model”

See all papers in Proc. ACL 2012 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.