Index of papers in Proc. ACL 2013 that mention
  • Chinese word segmentation
Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni
Abstract
While no segmented corpus of micro-blogs is available to train Chinese word segmentation model, existing Chinese word segmentation tools cannot perform equally well as in ordinary news texts.
Abstract
In this paper we present an effective yet simple approach to Chinese word segmentation of micro-blog.
Experiment
We use the benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff2 as the labeled data.
Experiment
The first two are both famous Chinese word segmentation tools: ICTCLAS3 and Stanford Chinese word segmenter4, which are widely used in NLP related to word segmentation.
Experiment
Stanford Chinese word segmenter is a CRF-based segmentation tool and its segmentation standard is chosen as the PKU standard, which is the same to ours.
INTRODUCTION
These new features of micro-blogs make the Chinese Word Segmentation (CWS) models trained on the source domain, such as news corpus, fail to perform equally well when transferred to texts from micro-blogs.
Our method
Chinese word segmentation problem might be treated as a character labeling problem which gives each character a label indicating its position in one word.
Related Work
Recent studies show that character sequence labeling is an effective formulation of Chinese word segmentation (Low et al., 2005; Zhao et al., 2006a,b; Chen et al., 2006; Xue, 2003).
Related Work
(1998) takes advantage of the huge amount of raw text to solve Chinese word segmentation problems.
Related Work
Besides, Sun and Xu (2011) uses a sequence labeling framework, while unsupervised statistics are used as discrete features in their model, which prove to be effective in Chinese word segmentation .
Chinese word segmentation is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Abstract
With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves significant improvement on a series of testing sets from different domains, even with a single classifier and local features.
Conclusion and Future Work
Experiments on Chinese word segmentation show that, the enhanced word segmenter achieves significant improvement on testing sets of different domains, although using a single classifier with only local features.
Experiments
We use the Penn Chinese Treebank 5.0 (CTB) (Xue et al., 2005) as the existing annotated corpus for Chinese word segmentation .
Experiments
Table 4: Comparison with state-of-the-art work in Chinese word segmentation .
Experiments
Table 4 shows the comparison with other work in Chinese word segmentation .
Introduction
Taking Chinese word segmentation for example, the state-of-the-art models (Xue and Shen, 2003; Ng and Low, 2004; Gao et al., 2005; Nakagawa and Uchimoto, 2007; Zhao and Kit, 2008; J iang et al., 2009; Zhang and Clark, 2010; Sun, 2011b; Li, 2011) are usually trained on human-annotated corpora such as the Penn Chinese Treebank (CTB) (Xue et al., 2005), and perform quite well on corresponding test sets.
Introduction
In the rest of the paper, we first briefly introduce the problems of Chinese word segmentation and the character classification model in section
Related Work
Li and Sun (2009) extracted character classification instances from raw text for Chinese word segmentation , resorting to the indication of punctuation marks between characters.
Related Work
Sun and Xu (Sun and Xu, 2011) utilized the features derived from large-scaled unlabeled text to improve Chinese word segmentation .
Chinese word segmentation is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Abstract
We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation , we propose to model the two tasks jointly.
Conclusion
There is a close dependency between Chinese word segmentation (CWS) and informal word recognition (IWR).
Introduction
This example illustrates the mutual dependency between Chinese word segmentation (henceforth, CWS) and informal word recognition (IWR) that should be solved jointly.
Methodology
Given an input Chinese microblog post, our method simultaneously segments the sentences into words (the Chinese Word Segmentation , CWS, task), and marks the component words as informal or formal ones (the Informal Word Re-congition, IWR, task).
Chinese word segmentation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Abstract
This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging.
Introduction
As far as we know, however, these methods have not yet been applied to resolve the problem of joint Chinese word segmentation (CWS) and POS tagging.
Method
This study introduces a novel semi-supervised approach for joint Chinese word segmentation and POS tagging.
Chinese word segmentation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting
Related Work
Zhao (2009) studied character-level dependencies for Chinese word segmentation by formalizing segmentsion task in a dependency parsing framework.
Related Work
They use it as a joint framework to perform Chinese word segmentation , POS tagging and syntax parsing.
Related Work
They exploit a generative maximum entropy model for character-based constituent parsing, and find that POS information is very useful for Chinese word segmentation , but high-level syntactic information seems to have little effect on segmentation.
Chinese word segmentation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: