Index of papers in Proc. ACL 2009 that mention

word segmentation

Seen in text as:

word segmentation (67)
Word segmentation (3)

Seen in 70 sentences in 4 papers.

1. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging -- A Case Study

Jiang, Wenbin and Huang, Liang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese.
Automatic Annotation Adaptation	Considering that word segmentation and Joint S&T can be conducted in the same character classification manner, we can design an unified standard adaptation framework for the two tasks, by taking the source classifier’s classification result as the guide information for the target classifier’s classification decision.
Introduction	Figure l: Incompatible word segmentation and POS tagging standards between CTB (upper) and People’s Daily (below).
Introduction	To test the efficacy of our method we choose Chinese word segmentation and part-of-speech tagging, where the problem of incompatible annotation standards is one of the most evident: so far no segmentation standard is widely accepted due to the lack of a clear definition of Chinese words, and the (almost complete) lack of morphology results in much bigger ambiguities and heavy debates in tagging philosophies for Chinese parts-of-speech.
Introduction	In addition, the improved accuracies from segmentation and tagging also lead to an improved parsing accuracy on CTB, reducing 38% of the error propagation from word segmentation to parsing.
Segmentation and Tagging as Character Classification	01 02 .. On where C,- is a character, word segmentation aims to split the sequence into m(§ n) words: 01161 Cel+lzeg -- Cem_1+1:em
Segmentation and Tagging as Character Classification	Xue and Shen (2003) describe for the first time the character classification approach for Chinese word segmentation , Where each character is given a boundary tag denoting its relative position in a word.
Segmentation and Tagging as Character Classification	In addition, Ng and Low (2004) find that, compared with POS tagging after word segmentation , Joint S&T can achieve higher accuracy on both segmentation and POS tagging.

word segmentation is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

2. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging.
Background	In joint word segmentation and the POS tagging process, the task is to predict a path
Experiments	Previous studies on joint Chinese word segmentation and POS tagging have used Penn Chinese Treebank (CTB) (Xia et al., 2000) in experiments.
Experiments	We evaluated both word segmentation (Seg) and joint word segmentation and POS tagging (Seg & Tag).
Experiments	(2008a; 2008b) on CTB 5.0 and Zhang and Clark (2008) on CTB 4.0 since they reported the best performances on joint word segmentation and POS tagging using the training materials only derived from the corpora.
Introduction	In Chinese, word segmentation and part-of-speech (POS) tagging are indispensable steps for higher-level NLP tasks.
Introduction	Word segmentation and POS tagging results are required as inputs to other NLP tasks, such as phrase chunking, dependency parsing, and machine translation.
Introduction	Word segmentation and POS tagging in a joint process have received much attention in recent research and have shown improvements over a pipelined fashion (Ng and Low, 2004; Nakagawa and Uchimoto, 2007; Zhang and Clark, 2008; Jiang et al., 2008a; Jiang et al., 2008b).
Policies for correct path selection	4In our experiments, the optimal threshold value 7" is selected by evaluating the performance of joint word segmentation and POS tagging on the development set.
Related work	Maximum entropy models are widely used for word segmentation and POS tagging tasks (Uchimoto et al., 2001; Ng and Low, 2004; Nakagawa, 2004; Nakagawa and Uchimoto, 2007) since they only need moderate training times while they provide reasonable performance.

word segmentation is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

3. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference.
Abstract	We confirmed that it significantly outperforms previous reported results in both phonetic transcripts and standard datasets for Chinese and Japanese word segmentation .
Inference	To find the hidden word segmentation w of a string 3 = 01 - - - c N, which is equivalent to the vector of binary hidden variables 2 = 21 - - - ZN, the simplest approach is to build a Gibbs sampler that randomly selects a character c,- and draw a binary decision 2,- as to whether there is a word boundary, and then update the language model according to the new segmentation (Goldwater et al., 2006; Xu et al., 2008).
Introduction	Asian languages such as Chinese and Japanese have no explicit word boundaries, thus word segmentation is a crucial first step when processing them.
Introduction	In order to extract “words” from text streams, unsupervised word segmentation is an important research area because the criteria for creating supervised training data could be arbitrary, and will be suboptimal for applications that rely on segmentations.
Introduction	This maximizes the probability of word segmentation w given a string 3 :
Nested Pitman-Yor Language Model	If a lexicon is finite, we can use a uniform prior G0(w) = l/\|V\| for every word 21) in lexicon V. However, with word segmentation every substring could be a word, thus the lexicon is not limited but will be countably infinite.
Nested Pitman-Yor Language Model	Building an accurate G0 is crucial for word segmentation , since it determines how the possible words will look like.

word segmentation is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

4. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Exploiting the Translated Treebank	However, Chinese has a special primary processing task, i.e., word segmentation .
Exploiting the Translated Treebank	Note that CTB or any other Chinese treebank has its own word segmentation guideline.
Exploiting the Translated Treebank	English treebank is translated into Chinese word by word, Chinese words in the translated text are exactly some entries from the bilingual lexicon, they are actually irregular phrases, short sentences or something else rather than words that follows any existing word segmentation convention.

word segmentation is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: