Index of papers in Proc. ACL 2014 that mention

word segmentation

Seen in text as:

word segmentation (106)
Word Segmentation (14)
word segmenters (3)

Seen in 121 sentences in 9 papers.

1. Modelling function words improves unsupervised word segmentation

Johnson, Mark and Christophe, Anne and Dupoux, Emmanuel and Demuth, Katherine

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic “function words” at the beginnings and endings of collocations of (possibly multisyllabic) words.
Abstract	This modification improves unsupervised word segmentation on the standard Bernstein-Ratner (1987) corpus of child-directed English by more than 4% token f-score compared to a model identical except that it does not special-case “function words”, setting a new state-of-the-art of 92.4% token f-score.
Introduction	We do this by comparing two computational models of word segmentation which differ solely in the way that they model function words.
Introduction	(1996) and Brent (1999) our word segmentation models identify word boundaries from unsegmented sequences of phonemes corresponding to utterances, effectively performing unsupervised learning of a lexicon.
Introduction	a word segmentation model should segment this as ju want tu si 69 buk, which is the IPA representation of “you want to see the book”.
Word segmentation with Adaptor Grammars	Perhaps the simplest word segmentation model is the unigram model, where utterances are modeled as sequences of words, and where each word is a sequence of segments (Brent, 1999; Goldwater et al., 2009).

word segmentation is mentioned in 24 sentences in this paper.

Topics mentioned in this paper:

2. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN).
Abstract	Despite Chinese word segmentation being a specific case, MMTNN can be easily generalized and applied to other sequence labeling tasks.
Conventional Neural Network	Formally, in the Chinese word segmentation task, we have a character dictionary D of size Unless otherwise specified, the character dictionary is extracted from the training set and unknown characters are mapped to a special symbol that is not used elsewhere.
Conventional Neural Network	In Chinese word segmentation , the most prevalent tag set T is BMES tag set, which uses 4 tags to carry word boundary information.
Conventional Neural Network	(2013) modeled Chinese word segmentation as a series of
Introduction	Therefore, word segmentation is a preliminary and important pre-process for Chinese language processing.
Introduction	(2011) to Chinese word segmentation and POS tagging and proposed a perceptron-style algorithm to speed up the training process with negligible loss in performance.
Introduction	We evaluate the performance of Chinese word segmentation on the PKU and MSRA benchmark datasets in the second International Chinese Word Segmentation Bakeoff (Emerson, 2005) which are commonly used for evaluation of Chinese word segmentation .

word segmentation is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

3. Chinese Morphological Analysis with Character-level POS Tagging

Shen, Mo and Liu, Hongxiao and Kawahara, Daisuke and Kurohashi, Sadao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The focus of recent studies on Chinese word segmentation , part-of-speech (POS) tagging and parsing has been shifting from words to characters.
Abstract	We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging.
Chinese Morphological Analysis with Character-level POS	Previous studies have shown that jointly processing word segmentation and POS tagging is preferable to pipeline processing, which can propagate errors (Nakagawa and Uchimoto, 2007; Kruengkrai et a1., 2009).
Conclusion	A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
Conclusion	Word Lattice Reranking for Chinese Word Segmentation
Evaluation	To evaluate our proposed method, we have conducted two sets of experiments on CTB5: word segmentation, and joint word segmentation and word-level POS tagging.
Evaluation	The results of the word segmentation experiment and the joint experiment of segmentation and POS tagging are shown in Table 5(a) and Table 5(b), respectively.
Evaluation	The results show that, while the differences between the baseline model and the proposed model in word segmentation accuracies are small, the proposed model achieves significant improvement in the experiment of joint segmentati-
Introduction	In recent years, the focus of research on Chinese word segmentation , part-of-speech (POS) tagging and parsing has been shifting from words toward characters.
Introduction	We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging.

word segmentation is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

4. Character-Level Chinese Dependency Parsing

Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Character-level information can benefit downstream applications by offering flexible granularities for word segmentation while improving word-level dependency parsing accuracies.
Character-Level Dependency Tree	We differentiate intra-word dependencies and inter-word dependencies by the arc type, so that our work can be compared with conventional word segmentation , POS-tagging and dependency parsing pipelines under a canonical segmentation standard.
Character-Level Dependency Tree	The character-level dependency trees hold to a specific word segmentation standard, but are not limited to it.
Character-Level Dependency Tree	A transition-based framework with global learning and beam search decoding (Zhang and Clark, 2011) has been applied to a number of natural language processing tasks, including word segmentation , PCS-tagging and syntactic parsing (Zhang and Clark, 2010; Huang and Sagae, 2010; Bohnet and Nivre, 2012; Zhang et al., 2013).
Introduction	Chinese dependency trees were conventionally defined over words (Chang et al., 2009; Li et al., 2012), requiring word segmentation and POS-tagging as preprocessing steps.
Introduction	First, character-level trees circumvent the issue that no universal standard exists for Chinese word segmentation .
Introduction	In the well-known Chinese word segmentation bakeoff tasks, for example, different segmentation standards have been used by different data sets (Emerson, 2005).

word segmentation is mentioned in 24 sentences in this paper.

Topics mentioned in this paper:

5. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This study investigates on building a better Chinese word segmentation model for statistical machine translation.
Experiments	The influence of the word segmentation on the final translation is our main investigation.
Experiments	Firstly, as expected, having word segmentation does help Chinese-to-English MT.
Experiments	This section aims to further analyze the three primary observations concluded in Section 4.3: 2') word segmentation is useful to SMT; ii) the treebank and the bilingual segmentation knowledge are helpful, performing segmentation of different nature; and iii) the bilingual constraints lead to learn segmentations better tailored for SMT.
Introduction	Word segmentation is regarded as a critical procedure for high-level Chinese language processing tasks, since Chinese scripts are written in continuous characters without explicit word boundaries (e.g., space in English).
Introduction	The empirical works show that word segmentation can be beneficial to Chinese-to-English statistical machine translation (SMT) (Xu et al., 2005; Chang et al., 2008; Zhao et al., 2013).
Introduction	The practice in state-of-the-art MT systems is that Chinese sentences are tokenized by a monolingual supervised word segmentation model trained on the hand-annotated treebank data, e.g., Chinese treebank

word segmentation is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

6. New Word Detection for Sentiment Analysis

Huang, Minlie and Ye, Borui and Wang, Yichen and Chen, Haiqiang and Cheng, Junjun and Zhu, Xiaoyan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation , named entity extraction, and sentiment analysis.
Experiment	The posts were then part-of-speech tagged using a Chinese word segmentation tool named ICTCLAS (Zhang et al., 2003).
Introduction	Automatic extraction of new words is indispensable to many tasks such as Chinese word segmentation , machine translation, named entity extraction, question answering, and sentiment analysis.
Introduction	New word detection is one of the most critical issues in Chinese word segmentation .
Introduction	Recent studies (Sproat and Emerson, 2003) (Chen, 2003) have shown that more than 60% of word segmentation errors result from new words.
Methodology	Obviously, in order to obtain the value of 3(wi), some particular Chinese word segmentation tool is required.
Related Work	New word detection has been usually interweaved with word segmentation , particularly in Chinese NLP.
Related Work	In these works, new word detection is considered as an integral part of segmentation, where new words are identified as the most probable segments inferred by the probabilistic models; and the detected new word can be further used to improve word segmentation .

word segmentation is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Maximum matching word segmentation is used with a large word vocabulary V extracted from web data provided by (Wang et al., 2013b).
Pinyin Input Method Model	Without word delimiters, linguists have argued on what a Chinese word really is for a long time and that is why there is always a primary word segmentation treatment in most Chinese language processing tasks (Zhao et al., 2006; Huang and Zhao, 2007; Zhao and Kit, 2008; Zhao et al., 2010; Zhao and Kit, 2011; Zhao et al., 2013).
Pinyin Input Method Model	A Chinese word may contain from 1 to over 10 characters due to different word segmentation conventions.
Pinyin Input Method Model	Nevertheless, pinyin syllable segmentation is a much easier problem compared to Chinese word segmentation .

word segmentation is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

8. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Unsupervised word segmentation (UWS) can provide domain-adaptive segmentation for statistical machine translation (SMT) without annotated data, and bilingual UWS can even optimize segmentation for alignment.
Complexity Analysis	The proposed method does not require any annotated data, but the SMT system with it can achieve comparable performance compared to state-of-the-art supervised word segmenters trained on precious annotated data.
Complexity Analysis	Moreover, the proposed method yields 0.96 BLEU improvement relative to supervised word segmenters on an out-of-domain corpus.
Complexity Analysis	Thus, we believe that the proposed method would benefit SMT related to low-resource languages where annotated data are scare, and would also find application in domains that differ too greatly from the domains on which supervised word segmenters were trained.
Introduction	Many languages, especially Asian languages such as Chinese, Japanese and Myanmar, have no explicit word boundaries, thus word segmentation (WS), that is, segmenting the continuous texts of these languages into isolated words, is a prerequisite for many natural language processing applications including SMT.
Introduction	o improvement of BLEU scores compared to supervised Stanford Chinese word segmenter .

word segmentation is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)

9. Omni-word Feature and Soft Constraint for Chinese Relation Extraction

Chen, Yanping and Zheng, Qinghua and Zhang, Wei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The Omni-word feature uses every potential word in a sentence as lexicon feature, reducing errors caused by word segmentation .
Abstract	Both Omni-word feature and soft constraint make a better use of sentence information and minimize the influences caused by Chinese word segmentation and parsing.
Feature Construction	On the other hand, the Omni-word can avoid these problems and take advantages of Chinese characteristics (the word-formation and the ambiguity of word segmentation ).
Introduction	Lacking of orthographic word makes Chinese word segmentation difficult.
Related Work	(2008; 2010) also pointed out that, due to the inaccuracy of Chinese word segmentation and parsing, the tree kernel based approach is inappropriate for Chinese relation extraction.

word segmentation is mentioned in 5 sentences in this paper.

Topics mentioned in this paper: