Chinese Morphological Analysis with Character-level POS Tagging
Shen, Mo and Liu, Hongxiao and Kawahara, Daisuke and Kurohashi, Sadao

Article Structure

Abstract

The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters.

Introduction

In recent years, the focus of research on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words toward characters.

Character-level POS Tagset

We propose a tagset for the task of character-level POS tagging.

Chinese Morphological Analysis with Character-level POS

3.1 System Description

Evaluation

4.1 Settings

Conclusion

We believe that by treating characters as the true atoms of Chinese morphological and syntactic analysis, it is possible to address the out-of-vocabulary problem that word-based methods have been long suffered from.

Topics

Word Segmentation

Appears in 20 sentences as: Word Segmentation (11) word segmentation (10)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. The focus of recent studies on Chinese word segmentation , part-of-speech (POS) tagging and parsing has been shifting from words to characters.
    Page 1, “Abstract”
  2. We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging.
    Page 1, “Abstract”
  3. In recent years, the focus of research on Chinese word segmentation , part-of-speech (POS) tagging and parsing has been shifting from words toward characters.
    Page 1, “Introduction”
  4. We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging.
    Page 2, “Introduction”
  5. Previous studies have shown that jointly processing word segmentation and POS tagging is preferable to pipeline processing, which can propagate errors (Nakagawa and Uchimoto, 2007; Kruengkrai et a1., 2009).
    Page 3, “Chinese Morphological Analysis with Character-level POS”
  6. To evaluate our proposed method, we have conducted two sets of experiments on CTB5: word segmentation, and joint word segmentation and word-level POS tagging.
    Page 5, “Evaluation”
  7. The results of the word segmentation experiment and the joint experiment of segmentation and POS tagging are shown in Table 5(a) and Table 5(b), respectively.
    Page 5, “Evaluation”
  8. The results show that, while the differences between the baseline model and the proposed model in word segmentation accuracies are small, the proposed model achieves significant improvement in the experiment of joint segmentati-
    Page 5, “Evaluation”
  9. (a) Word Segmentation Results
    Page 5, “Evaluation”
  10. A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
    Page 6, “Conclusion”
  11. Word Lattice Reranking for Chinese Word Segmentation
    Page 6, “Conclusion”

See all papers in Proc. ACL 2014 that mention Word Segmentation.

See all papers in Proc. ACL that mention Word Segmentation.

Back to top.

word-level

Appears in 17 sentences as: Word-level (1) word-level (18)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging.
    Page 1, “Abstract”
  2. Table l. Character-level POS sequence as a more specified version of word-level POS: an example of verb.
    Page 1, “Introduction”
  3. Another advantage of character-level P08 is that, the sequence of character-level P08 in a word can be seen as a more fine-grained version of word-level POS.
    Page 2, “Introduction”
  4. The five words in this table are very likely to be tagged with the same word-level POS as verb in any available annotated corpora, while it can be commonly agreed among native speakers of Chinese that the syntactic behaviors of these words are different from each other, due to their distinctions in word constructions.
    Page 2, “Introduction”
  5. Therefore, compared to word-level POS, the character-level POS can produce information for more expressive features during the learning process of a morphological analyzer.
    Page 2, “Introduction”
  6. We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging.
    Page 2, “Introduction”
  7. The CTB-style word-level POS are also shown for the examples.
    Page 2, “Introduction”
  8. Some of these tags are directly derived from the commonly accepted word-level part-of-speech, such as noun, verb, adjective and adverb.
    Page 2, “Character-level POS Tagset”
  9. This hybrid model constructs a lattice that consists of word-level and character-level nodes from a given input sentence.
    Page 3, “Chinese Morphological Analysis with Character-level POS”
  10. Word-level nodes correspond to words found in the system’s lexicon, which has been compiled from training data.
    Page 3, “Chinese Morphological Analysis with Character-level POS”
  11. upper part of the lattice (word-level nodes) represents known words, where each node carries information such as character form, character-level POS , and word-level POS.
    Page 4, “Chinese Morphological Analysis with Character-level POS”

See all papers in Proc. ACL 2014 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.

POS tagging

Appears in 15 sentences as: POS tag (1) POS Tagging (3) POS tagging (11) Pos Tagging (1) POS tags (2)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. We propose the first tagset designed for the task of character-level POS tagging .
    Page 1, “Abstract”
  2. We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging .
    Page 1, “Abstract”
  3. ith Character-level POS Tagging
    Page 1, “Introduction”
  4. We propose the first tagset designed for the task of character-level POS tagging , based on which we manually annotate the entire CTB5.
    Page 2, “Introduction”
  5. We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging .
    Page 2, “Introduction”
  6. We propose a tagset for the task of character-level POS tagging .
    Page 2, “Character-level POS Tagset”
  7. Previous studies have shown that jointly processing word segmentation and POS tagging is preferable to pipeline processing, which can propagate errors (Nakagawa and Uchimoto, 2007; Kruengkrai et a1., 2009).
    Page 3, “Chinese Morphological Analysis with Character-level POS”
  8. Baseline features: For word-level nodes that represent known words, we use the symbols w, p and l to denote the word form, POS tag and length of the word, respectively.
    Page 4, “Chinese Morphological Analysis with Character-level POS”
  9. Proposed features: For word-level nodes, the function CPpal-T (w) returns the pair of the char-acter-level POS tags of the first and last characters of w, and CPau(w) returns the sequence of character-level POS tags of w. If either the pair or the sequence of character-level P08 is ambiguous, which means there are multiple paths in the sub-lattice of the word-level node, then the values on the current best path (with local context) during the Viterbi search will be returned.
    Page 4, “Chinese Morphological Analysis with Character-level POS”
  10. To evaluate our proposed method, we have conducted two sets of experiments on CTB5: word segmentation, and joint word segmentation and word-level POS tagging .
    Page 5, “Evaluation”
  11. The results of the word segmentation experiment and the joint experiment of segmentation and POS tagging are shown in Table 5(a) and Table 5(b), respectively.
    Page 5, “Evaluation”

See all papers in Proc. ACL 2014 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

part-of-speech

Appears in 12 sentences as: Part-of-Speech (1) Part-of-speech (4) part-of-speech (7)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters.
    Page 1, “Abstract”
  2. In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis.
    Page 1, “Abstract”
  3. In recent years, the focus of research on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words toward characters.
    Page 1, “Introduction”
  4. In our view, since each Chinese character is in fact created as a word in origin with complete and independent meaning, it should be treated as the actual minimal morphological unit in Chinese language, and therefore should carry specific part-of-speech .
    Page 1, “Introduction”
  5. This suggests that character-level POS can be used as cues in predicting the part-of-speech of unknown words.
    Page 2, “Introduction”
  6. Tag Part-of-Speech Example n noun @INN (bill) V verb fi/V V (publish) j adj ./ adv.
    Page 2, “Introduction”
  7. Tagset for character-level part-of-speech tagging.
    Page 2, “Introduction”
  8. Some of these tags are directly derived from the commonly accepted word-level part-of-speech , such as noun, verb, adjective and adverb.
    Page 2, “Character-level POS Tagset”
  9. A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
    Page 6, “Conclusion”
  10. and Part-of-speech Tagging.
    Page 6, “Conclusion”
  11. Chinese Part-of-speech Tagging: One-at—a-time or All-at-once?
    Page 6, “Conclusion”

See all papers in Proc. ACL 2014 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

Chinese Word

Appears in 9 sentences as: Chinese Word (7) Chinese word (2)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters.
    Page 1, “Abstract”
  2. In recent years, the focus of research on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words toward characters.
    Page 1, “Introduction”
  3. A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
    Page 6, “Conclusion”
  4. Word Lattice Reranking for Chinese Word Segmentation
    Page 6, “Conclusion”
  5. An Error—Driven Word—Character Hybird Model for Joint Chinese Word Segmentation and POS Tagging.
    Page 6, “Conclusion”
  6. Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation.
    Page 6, “Conclusion”
  7. A Stacked Sub-word Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
    Page 6, “Conclusion”
  8. Chinese Word Segmentation as Character Tagging.
    Page 6, “Conclusion”
  9. Effective Tag Set Selection in Chinese Word Segmentation Via Conditional Random Field Modeling.
    Page 6, “Conclusion”

See all papers in Proc. ACL 2014 that mention Chinese Word.

See all papers in Proc. ACL that mention Chinese Word.

Back to top.

Chinese Word Segmentation

Appears in 9 sentences as: Chinese Word Segmentation (7) Chinese word segmentation (2)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. The focus of recent studies on Chinese word segmentation , part-of-speech (POS) tagging and parsing has been shifting from words to characters.
    Page 1, “Abstract”
  2. In recent years, the focus of research on Chinese word segmentation , part-of-speech (POS) tagging and parsing has been shifting from words toward characters.
    Page 1, “Introduction”
  3. A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
    Page 6, “Conclusion”
  4. Word Lattice Reranking for Chinese Word Segmentation
    Page 6, “Conclusion”
  5. An Error—Driven Word—Character Hybird Model for Joint Chinese Word Segmentation and POS Tagging.
    Page 6, “Conclusion”
  6. Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation .
    Page 6, “Conclusion”
  7. A Stacked Sub-word Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
    Page 6, “Conclusion”
  8. Chinese Word Segmentation as Character Tagging.
    Page 6, “Conclusion”
  9. Effective Tag Set Selection in Chinese Word Segmentation Via Conditional Random Field Modeling.
    Page 6, “Conclusion”

See all papers in Proc. ACL 2014 that mention Chinese Word Segmentation.

See all papers in Proc. ACL that mention Chinese Word Segmentation.

Back to top.

morphological analysis

Appears in 8 sentences as: Morphological Analysis (1) morphological analysis (3) morphological analyzer (3) morphological analyzers (1)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis .
    Page 1, “Abstract”
  2. Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved.
    Page 1, “Abstract”
  3. Therefore, compared to word-level POS, the character-level POS can produce information for more expressive features during the learning process of a morphological analyzer .
    Page 2, “Introduction”
  4. In this paper, we investigate the usefulness of character-level P08 in the task of Chinese morphological analysis .
    Page 2, “Introduction”
  5. Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved.
    Page 2, “Introduction”
  6. In Table 6 we compare our approach with morphological analyzers in previous studies.
    Page 5, “Evaluation”
  7. In our error analysis, we believe that by exploring the character-level POS and the internal word structure (Zhang et a1., 2013) at the same time, it is possible to further improve the performance of morphological analysis and parsing.
    Page 5, “Conclusion”
  8. Corpus-based Japanese Morphological Analysis .
    Page 6, “Conclusion”

See all papers in Proc. ACL 2014 that mention morphological analysis.

See all papers in Proc. ACL that mention morphological analysis.

Back to top.

Part-of-speech Tagging

Appears in 5 sentences as: Part-of-speech Tagging (4) part-of-speech tagging (1)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. Tagset for character-level part-of-speech tagging .
    Page 2, “Introduction”
  2. A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging .
    Page 6, “Conclusion”
  3. and Part-of-speech Tagging .
    Page 6, “Conclusion”
  4. Chinese Part-of-speech Tagging : One-at—a-time or All-at-once?
    Page 6, “Conclusion”
  5. A Stacked Sub-word Model for Joint Chinese Word Segmentation and Part-of-speech Tagging .
    Page 6, “Conclusion”

See all papers in Proc. ACL 2014 that mention Part-of-speech Tagging.

See all papers in Proc. ACL that mention Part-of-speech Tagging.

Back to top.

proposed model

Appears in 3 sentences as: proposed model (4)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. “CharPos” stands for our proposed model which has been described in section 3.
    Page 5, “Evaluation”
  2. The results show that, while the differences between the baseline model and the proposed model in word segmentation accuracies are small, the proposed model achieves significant improvement in the experiment of joint segmentati-
    Page 5, “Evaluation”
  3. As the results show, despite the fact that the performance of our baseline model is relatively weak in the joint segmentation and POS tagging task, our proposed model achieves the second-best performance in both segmentation and joint tasks.
    Page 5, “Evaluation”

See all papers in Proc. ACL 2014 that mention proposed model.

See all papers in Proc. ACL that mention proposed model.

Back to top.

significantly improved

Appears in 3 sentences as: significant improvement (1) significantly improved (2)
In Chinese Morphological Analysis with Character-level POS Tagging
  1. Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved .
    Page 1, “Abstract”
  2. Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved .
    Page 2, “Introduction”
  3. The results show that, while the differences between the baseline model and the proposed model in word segmentation accuracies are small, the proposed model achieves significant improvement in the experiment of joint segmentati-
    Page 5, “Evaluation”

See all papers in Proc. ACL 2014 that mention significantly improved.

See all papers in Proc. ACL that mention significantly improved.

Back to top.