Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Sun, Weiwei and Uszkoreit, Hans

Article Structure

Abstract

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing.

Introduction

In grammar, a part-of-speech (P08) is a linguistic category of words, which is generally defined by the syntactic or morphological behavior of the word in question.

State-of-the-Art

2.1 Previous Work

Capturing Paradigmatic Relations via Word Clustering

To bridge the gap between high and low frequency words, we employ word clustering to acquire

Capturing Syntagmatic Relations via Constituency Parsing

Syntactic analysis, especially the full and deep one, reflects syntagmatic relations of words and phrases of sentences.

Combining Both

We have introduced two separate improvements for Chinese POS tagging, which capture different types of lexical relations.

Conclusion

We hold a view of structuralist linguistics and study the impact of paradigmatic and syntagmatic lexical relations on Chinese POS tagging.

Topics

POS tagging

Appears in 35 sentences as: POS tag (2) POS tagger (2) POS taggers (1) POS Tagging (1) POS tagging (27) POS tags (2)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging , an important and challenging task for Chinese language processing.
    Page 1, “Abstract”
  2. Automatically assigning POS tags to words plays an important role in parsing, word sense disambiguation, as well as many other NLP applications.
    Page 1, “Introduction”
  3. While state-of-the-art tagging systems have achieved accuracies above 97% on English, Chinese POS tagging has proven to be more challenging and obtained accuracies about 93-94% (Tseng et al., 2005b; Huang et al., 2007, 2009; Li et al., 2011).
    Page 1, “Introduction”
  4. It is generally accepted that Chinese POS tagging often requires more sophisticated language processing techniques that are capable of drawing inferences from more subtle linguistic knowledge.
    Page 1, “Introduction”
  5. Both paradigmatic and syntagmatic leXical relations have a great impact on POS tagging , because the value of a word is determined by the two relations.
    Page 1, “Introduction”
  6. Our error analysis of a state-of-the-art Chinese POS tagger shows that the lack of both paradigmatic and syntagmatic leXical knowledge accounts for a large part of tagging errors.
    Page 1, “Introduction”
  7. This paper is concerned with capturing paradigmatic and syntagmatic leXical relations to advance the state-of-the-art of Chinese POS tagging .
    Page 1, “Introduction”
  8. The word clusters are then eXplicitly utilized to design new features for POS tagging .
    Page 1, “Introduction”
  9. Second, we study the possible impact of syntagmatic relations on POS tagging by comparatively analyzing a (syntax-free) sequential tagging model
    Page 1, “Introduction”
  10. We implement a discriminative sequential classification model for POS tagging which achieves the state-of-the-art accuracy.
    Page 2, “Introduction”
  11. In some cases, the methods work well without large modifications, such as German POS tagging .
    Page 2, “State-of-the-Art”

See all papers in Proc. ACL 2012 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

unlabeled data

Appears in 8 sentences as: unlabeled data (9)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger.
    Page 1, “Abstract”
  2. First, we employ unsupervised word clustering to explore paradigmatic relations that are encoded in large-scale unlabeled data .
    Page 1, “Introduction”
  3. Brown Clustering Our first choice is the bottom-up agglomerative word clustering algorithm of (Brown et al., 1992) which derives a hierarchical clustering of words from unlabeled data .
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”
  4. The large-scale unlabeled data we use in our experiments comes from the Chinese Gigaword (LDC2005T14).
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”
  5. Furthermore, as shown in (Sun and Xu, 2011), appropriate string knowledge acquired from large-scale unlabeled data can significantly enhance a supervised model, especially for the prediction of out-of-vocabulary (OOV) words.
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”
  6. When a comparable amount of unlabeled data (five years’ data) is used, the further increase of the unlabeled data for clustering does not lead to much changes of the tagging performance.
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”
  7. Word clustering derives paradigmatic relational information from unlabeled data by grouping words into different sets.
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”
  8. This result indicates that the improved predictive power partially comes from the new interpretation of a POS tag through a clustering, and partially comes from its memory of OOV words that appears in the unlabeled data .
    Page 6, “Capturing Paradigmatic Relations via Word Clustering”

See all papers in Proc. ACL 2012 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

Berkeley parser

Appears in 5 sentences as: Berkeley parser (6)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. We then present a comparative study of our tagger and the Berkeley parser , and show that the combination of the two models can significantly improve tagging accuracy.
    Page 2, “Introduction”
  2. On the whole, the Berkeley parser processes IV words slightly better than our tagger, but processes OOV words significantly worse.
    Page 7, “Capturing Syntagmatic Relations via Constituency Parsing”
  3. The numbers in this table clearly shows the main weakness of the Berkeley parser is the the predictive power of the OOV words.
    Page 7, “Capturing Syntagmatic Relations via Constituency Parsing”
  4. We still use a Bagging model to integrate the discriminative tagger and the Berkeley parser .
    Page 8, “Combining Both”
  5. 11 indicate that the parsing accuracy of the Berkeley parser can be simply improved by inputting the Berkeley parser with the POS Bagging results.
    Page 8, “Combining Both”

See all papers in Proc. ACL 2012 that mention Berkeley parser.

See all papers in Proc. ACL that mention Berkeley parser.

Back to top.

bigram

Appears in 5 sentences as: Bigram (1) bigram (3) bigrams (1)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Word bigrams : w_2_w_1, w_1_w, w_w+1, w+1_w+2; In order to better handle unknown words, we extract morphological features: character n- gram prefixes and suffixes for n up to 3.
    Page 2, “State-of-the-Art”
  2. (2009) introduced a bigram HMM model with latent variables (Bi gram HMM-LA in the table) for Chinese tagging.
    Page 3, “State-of-the-Art”
  3. Trigram HMM (Huang et al., 2009) 93.99% Bigram HMM-LA (Huang et al., 2009) 94.53% Our tagger 94.69%
    Page 3, “State-of-the-Art”
  4. The quality is defined based on a class-based bigram language model as follows.
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”
  5. The objective function is maximizing the likelihood H2121 P(w7;|w1, ..., wi_1) of the training data given a partially class-based bigram model of the form
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”

See all papers in Proc. ACL 2012 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

CoNLL

Appears in 5 sentences as: CoNLL (5)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. For detailed analysis and evaluation, we conduct further experiments following the setting of the CoNLL 2009 shared task.
    Page 3, “State-of-the-Art”
  2. For the following experiments, we only report results on the development data of the CoNLL setting.
    Page 3, “State-of-the-Art”
  3. Features Data Brown MKCLS Baseline CoNLL 94.48%
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”
  4. Table 10: Tagging accuracies on the test data ( CoNLL ).
    Page 8, “Combining Both”
  5. ( CoNLL )
    Page 8, “Combining Both”

See all papers in Proc. ACL 2012 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

labeled data

Appears in 5 sentences as: labeled data (6)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. In this paper, we use CTB 6.0 as the labeled data for the study.
    Page 3, “State-of-the-Art”
  2. Previous research shows that character-based segmentation models trained on labeled data are reasonably accurate (Sun, 2010).
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”
  3. Table 5 summarizes the accuracies of the systems when trained on smaller portions of the labeled data .
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”
  4. In other words, the word cluster features can significantly reduce the amount of labeled data required by the learning algorithm.
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”
  5. The relative reduction is greatest when smaller amounts of the labeled data are used, and the effect lessens as more labeled data is added.
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”

See all papers in Proc. ACL 2012 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

significantly improve

Appears in 4 sentences as: significant improvement (1) significantly improve (2) significantly improved (1)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Experiments show that this model are significantly improved by word cluster features in accuracy across a wide range of conditions.
    Page 2, “Introduction”
  2. We then present a comparative study of our tagger and the Berkeley parser, and show that the combination of the two models can significantly improve tagging accuracy.
    Page 2, “Introduction”
  3. The significant improvement of the POS tagging also help successive language processing.
    Page 8, “Combining Both”
  4. Both enhancements significantly improve the state-of-the-art of Chinese POS tagging.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention significantly improve.

See all papers in Proc. ACL that mention significantly improve.

Back to top.

Treebank

Appears in 4 sentences as: Treebank (4)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations.
    Page 1, “Abstract”
  2. We conduct experiments on the Penn Chinese Treebank and Chinese Gigaword.
    Page 2, “Introduction”
  3. Their evaluations on the Chinese Treebank show that Chinese POS tagging obtains an accuracy of about 93-94%.
    Page 2, “State-of-the-Art”
  4. Penn Chinese Treebank (CTB) (Xue et al., 2005) is a popular data set to evaluate a number of Chinese NLP tasks, including word segmentation (Sun and
    Page 2, “State-of-the-Art”

See all papers in Proc. ACL 2012 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

constituency parsing

Appears in 3 sentences as: constituency parsing (1) constituent parsers (1) constituent parsing (1)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Syntagmatic lexical relations are implicitly captured by constituent parsing and are utilized via system combination.
    Page 1, “Abstract”
  2. Xu, 2011), POS tagging (Huang et al., 2007, 2009), constituency parsing (Zhang and Clark, 2009; Wang et al., 2006) and dependency parsing (Zhang and Clark, 2008; Huang and Sagae, 2010; Li et al., 2011).
    Page 3, “State-of-the-Art”
  3. The majority of the state-of-the-art constituent parsers are based on generative PCFG learning, with lexicalized (Collins, 2003; Chamiak, 2000) or latent annotation (PCFG-LA) (Matsuzaki et al., 2005; Petrov et al., 2006; Petrov and Klein, 2007) refinements.
    Page 6, “Capturing Syntagmatic Relations via Constituency Parsing”

See all papers in Proc. ACL 2012 that mention constituency parsing.

See all papers in Proc. ACL that mention constituency parsing.

Back to top.

content words

Appears in 3 sentences as: Content Words (1) content words (2)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Another interesting fact is that almost all of them are content words .
    Page 6, “Capturing Paradigmatic Relations via Word Clustering”
  2. 4.1.1 Content Words vs. Function Words
    Page 6, “Capturing Syntagmatic Relations via Constituency Parsing”
  3. The majority of the words that are better labeled by the tagger are content words , including nouns(NN, NR, NT), numbers (CD, OD), predicates (VA, VC, VE), adverbs (AD), nominal modifiers (JJ), and so on.
    Page 6, “Capturing Syntagmatic Relations via Constituency Parsing”

See all papers in Proc. ACL 2012 that mention content words.

See all papers in Proc. ACL that mention content words.

Back to top.

parsing model

Appears in 3 sentences as: parsing model (2) parsing models (1)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. and a (syntax-based) chart parsing model .
    Page 2, “Introduction”
  2. We can see that the Bagging model taking both sequential tagging and chart parsing models as basic systems outperform the baseline systems and the Bagging model taking either model in isolation as basic systems.
    Page 7, “Capturing Syntagmatic Relations via Constituency Parsing”
  3. interesting phenomenon is that the Bagging method can also improve the parsing model , but there is a decrease while only combining taggers.
    Page 8, “Capturing Syntagmatic Relations via Constituency Parsing”

See all papers in Proc. ACL 2012 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

word segmentation

Appears in 3 sentences as: Word Segmentation (1) word segmentation (2)
In Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
  1. Penn Chinese Treebank (CTB) (Xue et al., 2005) is a popular data set to evaluate a number of Chinese NLP tasks, including word segmentation (Sun and
    Page 2, “State-of-the-Art”
  2. 3.1.3 Preprocessing: Word Segmentation
    Page 4, “Capturing Paradigmatic Relations via Word Clustering”
  3. In this table, the symbol “+” in the Features column means current configuration contains both the baseline features and new cluster-based features; the number is the total number of the clusters; the symbol “+” in the Data column means which portion of the Gigaword data is used to cluster words; the symbol “S” and “SS” in parentheses denote (s)upervised and (s)emi-(s)upervised word segmentation .
    Page 5, “Capturing Paradigmatic Relations via Word Clustering”

See all papers in Proc. ACL 2012 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.