Index of papers in Proc. ACL 2012 that mention

word segmentation

Seen in text as:

word segmentation (75)
Word Segmentation (5)

Seen in 84 sentences in 8 papers.

1. Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose the first joint model for word segmentation , POS tagging, and dependency parsing for Chinese.
Introduction	spaces) between words, word segmentation is the crucial first step that is necessary to perform Virtually all NLP tasks.
Introduction	Because the tasks of word segmentation and POS tagging have strong interactions, many studies have been devoted to the task of joint word segmentation and POS tagging for languages such as Chinese (e.g.
Introduction	The joint approach to word segmentation and POS tagging has been reported to improve word segmentation and POS tagging accuracies by more than
Model	(2011), we build our joint model to solve word segmentation , POS tagging, and dependency parsing within a single framework.
Model	In our joint model, the early update is invoked by mistakes in any of word segmentation , POS tagging, or dependency parsing.
Related Works	In addition, the lattice does not include word segmentation ambiguities crossing boundaries of space-delimited tokens.
Related Works	However, because they regarded word segmentation as given, their model did not consider the

word segmentation is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

2. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a joint model for Chinese word segmentation and new word detection.
Abstract	As we know, training a word segmentation system on large-scale datasets is already costly.
Introduction	The major problem of Chinese word segmentation is the ambiguity.
Introduction	In this paper, we present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling of Chinese word segmentation (CWS) and new word detection (NWD).
Introduction	As we know, training a word segmentation system on large-scale datasets is already costly.
Related Work	First, we review related work on word segmentation and new word detection.
Related Work	2.1 Word Segmentation and New Word Detection
Related Work	Conventional approaches to Chinese word segmentation treat the problem as a sequential labeling task (Xue, 2003; Peng et al., 2004; Tseng et al., 2005; Asahara et al., 2005; Zhao et al., 2010).

word segmentation is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

3. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.
Abstract	In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation .
Abstract	For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging.

word segmentation is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

4. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

About Heterogeneous Annotations	For Chinese word segmentation and POS tagging, supervised learning has become a dominant paradigm.
About Heterogeneous Annotations	Take Chinese word segmentation for example.
Abstract	We address the issue of consuming heterogeneous annotation data for Chinese word segmentation and part-of-speech tagging.
Conclusion	Our theoretical and empirical analysis of two representative popular corpora highlights two essential characteristics of heterogeneous annotations which are eXplored to reduce approximation and estimation errors for Chinese word segmentation and POS tagging.
Experiments	Previous studies on joint Chinese word segmentation and POS tagging have used the CTB in experiments.
Introduction	This paper explores heterogeneous annotations to reduce both approximation and estimation errors for Chinese word segmentation and part-of-speech (POS) tagging, which are fundamental steps for more advanced Chinese language processing tasks.
Introduction	In particular, joint word segmentation and POS tagging is addressed as a two step process.
Joint Chinese Word Segmentation and POS Tagging	words, word segmentation and POS tagging are important initial steps for Chinese language processing.
Joint Chinese Word Segmentation and POS Tagging	Two kinds of approaches are popular for joint word segmentation and POS tagging.

word segmentation is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

5. Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

Liu, Chang and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	TESLA-CELAB does not have a segmentation step, hence it will not introduce word segmentation errors.
Discussion and Future Work	Chinese word segmentation .
Experiments	We use the Stanford Chinese word segmenter (Tseng et al., 2005) and POS tagger (Toutanova et al., 2003) for preprocessing and Cilin for synonym
Experiments	Note also that the word segmentations shown in these examples are for clarity only.
Introduction	The most obvious challenge for Chinese is that of word segmentation .
Introduction	However, many different segmentation standards eXist for different purposes, such as Microsoft Research Asia (MSRA) for Named Entity Recognition (NER), Chinese Treebank (CTB) for parsing and part-of-speech (POS) tagging, and City University of Hong Kong (CITYU) and Academia Sinica (AS) for general word segmentation and POS tagging.
Introduction	The only prior work attempting to address the problem of word segmentation in automatic MT evaluation for Chinese that we are aware of is Li et
Motivation	Character-based metrics do not suffer from errors and differences in word segmentation , so and ¥_l—?fi_5lk would be judged exactly equal.

word segmentation is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word-level (15)
n-grams (12)

6. Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Elsner, Micha and Goldwater, Sharon and Eisenstein, Jacob

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	This version, which contains 9790 utterances (33399 tokens, 1321 types), is now standard for word segmentation , but contains no phonetic variability.
Experiments	As a simple extension of our model to the case of unknown word boundaries, we interleave it with an existing model of word segmentation , olpseg (Gold-
Introduction	For example, many models of word segmentation implicitly or explicitly build a lexicon while segmenting the input stream of phonemes into word tokens; in nearly all cases the phonemic input is created from an orthographic transcription using a phonemic dictionary, thus abstracting away from any phonetic variability (Brent, 1999; Venkataraman, 2001; Swingley, 2005; Goldwater et al., 2009, among others).
Related work	A final line of related work is on word segmentation .
Related work	In addition to the models mentioned in Section 1, which use phonemic input, a few models of word segmentation have been tested using phonetic input (Fleck, 2008; Rytting, 2007; Daland and Pierrehum—bert, 2010).

word segmentation is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

Li, Zhenghua and Liu, Ting and Che, Wanxiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Analysis	CDT and CTB5/6 adopt different POS tag sets, and converting from one tag set to another is difficult (Niu et al., 2009).5 To overcome this problem, we use the People’s Daily corpus (PD),6 a large—scale corpus annotated with word segmentation and POS tags, to train a statistical POS tagger.
Experiments and Analysis	5 The word segmentation standards of the two treebanks also slightly differs, which are not considered in this work.
Experiments and Analysis	Moreover, inferior results may be gained due to the differences between CTB5 and PD in word segmentation standards and text sources.
Related Work	(2009) improve the performance of word segmentation and part—of—speech (POS) tagging on CTBS using another large—scale corpus of different annotation standards (People’s Daily).

word segmentation is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Sun, Weiwei and Uszkoreit, Hans

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Capturing Paradigmatic Relations via Word Clustering	3.1.3 Preprocessing: Word Segmentation
Capturing Paradigmatic Relations via Word Clustering	In this table, the symbol “+” in the Features column means current configuration contains both the baseline features and new cluster-based features; the number is the total number of the clusters; the symbol “+” in the Data column means which portion of the Gigaword data is used to cluster words; the symbol “S” and “SS” in parentheses denote (s)upervised and (s)emi-(s)upervised word segmentation .
State-of-the-Art	Penn Chinese Treebank (CTB) (Xue et al., 2005) is a popular data set to evaluate a number of Chinese NLP tasks, including word segmentation (Sun and

word segmentation is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: