Index of papers in Proc. ACL 2013 that mention

word segmentation

Seen in text as:

word segmentation (122)
word segmenter (5)
Word Segmentation (4)
Word segmentation (3)

Seen in 126 sentences in 10 papers.

1. A joint model of word segmentation and phonological variation for English word-final /t/-deletion

Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Current computational models of unsupervised word segmentation usually assume idealized input that is devoid of these kinds of variation.
Abstract	We extend a nonparametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model.
Abstract	We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s.
Background and related work	This permits us to develop a joint generative model for both word segmentation and variation which we plan to extend to handle more phenomena in future work.
Background and related work	They do not aim for a joint model that also handles word segmentation , however, and rather than training their model on an actual corpus, they evaluate on constructed lists of examples, mimicking frequencies of real data.
Experiments 4.1 The data	This allows us to investigate the strength of the statistical signal for the deletion rule without confounding it with the word segmentation performance, and to see how the different contextual settings uniform, right and left-right handle the data.
Experiments 4.1 The data	Table 5: /t/-recovery F-scores when performing joint word segmention in the left-right setting, averaged over two runs (standard errors less than 2%).
Experiments 4.1 The data	Finally, we are also interested to learn how well we can do word segmentation and underlying /t/-recovery jointly.
Introduction	Computational models of word segmentation try to solve one of the first problems language learners have to face: breaking an unsegmented stream of sound segments into individual words.
The computational model	Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation .
The computational model	Bayesian word segmentation models try to compactly represent the observed data in terms of a small set of units (word types) and a short analysis (a small number of word tokens).

word segmentation is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

2. Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Structural information in web text provides natural annotations for NLP problems such as word segmentation and parsing.
Abstract	With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves significant improvement on a series of testing sets from different domains, even with a single classifier and local features.
Introduction	Problems related to information retrieval, machine translation and social computing need fast and accurate text processing, for example, word segmentation and parsing.
Introduction	Taking Chinese word segmentation for example, the state-of-the-art models (Xue and Shen, 2003; Ng and Low, 2004; Gao et al., 2005; Nakagawa and Uchimoto, 2007; Zhao and Kit, 2008; J iang et al., 2009; Zhang and Clark, 2010; Sun, 2011b; Li, 2011) are usually trained on human-annotated corpora such as the Penn Chinese Treebank (CTB) (Xue et al., 2005), and perform quite well on corresponding test sets.
Introduction	(b) Knowledge for word segmentation

word segmentation is mentioned in 41 sentences in this paper.

Topics mentioned in this paper:

3. Chinese Parsing Exploiting Characters

Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Character-based Chinese Parsing	To produce character-level trees for Chinese NLP tasks, we develop a character-based parsing model, which can jointly perform word segmentation , POS tagging and phrase-structure parsing.
Character-based Chinese Parsing	First, we split the original SHIFT action into SHIFT—SEPARATE (t) and SHIFT—APPEND, which jointly perform the word segmentation and POS tagging tasks.
Character-based Chinese Parsing	The string features are used for word segmentation and POS tagging, and are adapted from a state-of-the-art joint segmentation and tagging model (Zhang and Clark, 2010).
Experiments	Since our model can jointly process word segmentation , POS tagging and phrase-structure parsing, we evaluate our model for the three tasks, respectively.
Experiments	For word segmentation and POS tagging, standard metrics of word precision, recall and F-score are used, where the tagging accuracy is the joint accuracy of word segmentation and POS tagging.
Introduction	With regard to task of parsing itself, an important advantage of the character-level syntax trees is that they allow word segmentation , part-of-speech (POS) tagging and parsing to be performed jointly, using an efficient CKY-style or shift-reduce algorithm.
Introduction	To analyze word structures in addition to phrase structures, our character-based parser naturally performs joint word segmentation , POS tagging and parsing jointly.
Introduction	We extend their shift-reduce framework, adding more transition actions for word segmentation and POS tagging, and defining novel features that capture character information.
Word Structures and Syntax Trees	They made use of this information to help joint word segmentation and POS tagging.
Word Structures and Syntax Trees	For leaf characters, we follow previous work on word segmentation (Xue, 2003; Ng and Low, 2004), and use “b” and “i” to indicate the beginning and non-beginning characters of a word, respectively.

word segmentation is mentioned in 24 sentences in this paper.

Topics mentioned in this paper:

4. Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	While no segmented corpus of micro-blogs is available to train Chinese word segmentation model, existing Chinese word segmentation tools cannot perform equally well as in ordinary news texts.
Abstract	In this paper we present an effective yet simple approach to Chinese word segmentation of micro-blog.
Experiment	We use the benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff2 as the labeled data.
Experiment	The first two are both famous Chinese word segmentation tools: ICTCLAS3 and Stanford Chinese word segmenter4, which are widely used in NLP related to word segmentation .
Experiment	Stanford Chinese word segmenter is a CRF-based segmentation tool and its segmentation standard is chosen as the PKU standard, which is the same to ours.
INTRODUCTION	These new features of micro-blogs make the Chinese Word Segmentation (CWS) models trained on the source domain, such as news corpus, fail to perform equally well when transferred to texts from micro-blogs.
Our method	Chinese word segmentation problem might be treated as a character labeling problem which gives each character a label indicating its position in one word.
Related Work	Recent studies show that character sequence labeling is an effective formulation of Chinese word segmentation (Low et al., 2005; Zhao et al., 2006a,b; Chen et al., 2006; Xue, 2003).
Related Work	On the other hand unsupervised word segmentation Peng and Schu-urmans (2001); Goldwater et al.
Related Work	(1998) takes advantage of the huge amount of raw text to solve Chinese word segmentation problems.

word segmentation is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

5. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging.
Introduction	Word segmentation and part-of-speech (POS) tagging are two critical and necessary initial procedures with respect to the majority of high-level Chinese language processing tasks such as syntax parsing, information extraction and machine translation.
Introduction	The joint approaches of word segmentation and POS tagging (joint S&T) are proposed to resolve these two tasks simultaneously.
Introduction	As far as we know, however, these methods have not yet been applied to resolve the problem of joint Chinese word segmentation (CWS) and POS tagging.
Method	The performance measurement indicators for word segmentation and POS tagging (joint S&T) are balance F-score, F = 2PIU(P+R), the harmonic mean of precision (P) and recall (R), and out-of-vocabulary recall (OOV—R).
Method	This outcome verifies the commonly accepted fact that the joint model can substantially improve the pipeline one, since POS tags provide additional information to word segmentation (Ng and Low, 2004).
Method	Overall, for word segmentation , it obtains average improvements of 1.43% and 8.09% in F-score and OOV—R over others; for POS tagging, it achieves average improvements of 1.09% and 7.73%.

word segmentation is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

Wang, Aobo and Kan, Min-Yen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation , we propose to model the two tasks jointly.
Conclusion	There is a close dependency between Chinese word segmentation (CWS) and informal word recognition (IWR).
Introduction	This example illustrates the mutual dependency between Chinese word segmentation (henceforth, CWS) and informal word recognition (IWR) that should be solved jointly.
Methodology	Given an input Chinese microblog post, our method simultaneously segments the sentences into words (the Chinese Word Segmentation , CWS, task), and marks the component words as informal or formal ones (the Informal Word Re-congition, IWR, task).
Methodology	Character-based sequence labeling is employed for word segmentation due to its simplicity and robustness to the unknown word problem (Xue, 2003).
Related Work	Closely related to our work is the task of Chinese new word detection, normally treated as a separate process from word segmentation in most previous works (Chen and Bai, 1998; Wu and Jiang, 2000; Chen and Ma, 2002; Gao et al., 2005).

word segmentation is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

SVM (13)
baseline systems (11)
CRF (11)

7. Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models.
Experiment	Table 2 shows the F-score results of word segmentation on CTB-5, CTB-6 and CTB-7 testing sets.
Experiment	It is a supervised joint model of word segmentation , POS tagging and dependency parsing.
Introduction	Chinese word segmentation (CWS) is a critical and a necessary initial procedure with respect to the majority of high-level Chinese language processing tasks such as syntax parsing, information extraction and machine translation, since Chinese scripts are written in continuous characters without explicit word boundaries.
Segmentation Models	Character-based models treat word segmentation as a sequence labeling problem, assigning labels to the characters in a sentence indicating their positions in a word.

word segmentation is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. An improved MDL-based compression algorithm for unsupervised word segmentation

Chen, Ruey-Cheng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression.
Concluding Remarks	A natural extension of this work is to reproduce this result on some other word segmentation benchmarks, specifically those in other Asian languages (Emerson, 2005; Zhikov et al., 2010).
Introduction	Hierarchical Bayes methods have been mainstream in unsupervised word segmentation since the dawn of hierarchical Dirichlet process (Goldwater et al., 2009) and adaptors grammar (Johnson and Goldwater, 2009).

word segmentation is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

9. Accurate Word Segmentation using Transliteration and Language Model Projection

Hagiwara, Masato and Sekine, Satoshi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Transliterated compound nouns not separated by whitespaces pose difficulty on word segmentation 0f-flz'ne approaches have been proposed to split them using word statistics, but they rely on static lexicon, limiting their use.
Introduction	Accurate word segmentation (WS) is the key components in successful language processing.
Introduction	For example, when splitting a compound noun 7 fiafi‘y‘ynl/‘yF‘ bumkz’sshureddo, a traditional word segmenter can easily segment this as 7 fififi‘y/‘an/yh“ “*blacki shred” since Val/y F‘ shureddo “shred” is a known, frequent word.

word segmentation is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

LM (19)
PoS tags (4)
bigram (3)

10. Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

Wang, Zhigang and Li, Zhixing and Li, Juanzi and Tang, Jie and Z. Pan, Jeff

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The first category is caused by incorrect word segmentation (40.85%).
Experiments	The result of word segmentation directly decide the performance of extraction so it causes most of the errors.
Experiments	In the future, we can improve the performance of WikiCiKE by polishing the word segmentation result.

word segmentation is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: