Index of papers in Proc. ACL that mention

Chinese word

Seen in text as:

Chinese word (132)
Chinese words (34)
Chinese Word (16)

Seen in 173 sentences in 26 papers.

1. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN).
Abstract	Despite Chinese word segmentation being a specific case, MMTNN can be easily generalized and applied to other sequence labeling tasks.
Conventional Neural Network	Formally, in the Chinese word segmentation task, we have a character dictionary D of size Unless otherwise specified, the character dictionary is extracted from the training set and unknown characters are mapped to a special symbol that is not used elsewhere.
Conventional Neural Network	In Chinese word segmentation, the most prevalent tag set T is BMES tag set, which uses 4 tags to carry word boundary information.
Conventional Neural Network	(2013) modeled Chinese word segmentation as a series of
Introduction	(2011) to Chinese word segmentation and POS tagging and proposed a perceptron-style algorithm to speed up the training process with negligible loss in performance.
Introduction	We evaluate the performance of Chinese word segmentation on the PKU and MSRA benchmark datasets in the second International Chinese Word Segmentation Bakeoff (Emerson, 2005) which are commonly used for evaluation of Chinese word segmentation.
Introduction	0 We propose a Max-Margin Tensor Neural Network for Chinese word segmentation without feature engineering.
Max-Margin Tensor Neural Network	In Chinese word segmentation, a proper modeling of the tag-tag interaction, tag-character interaction and character-character interaction is very important.

Chinese word is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

2. Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	While no segmented corpus of micro-blogs is available to train Chinese word segmentation model, existing Chinese word segmentation tools cannot perform equally well as in ordinary news texts.
Abstract	In this paper we present an effective yet simple approach to Chinese word segmentation of micro-blog.
Experiment	We use the benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff2 as the labeled data.
Experiment	The first two are both famous Chinese word segmentation tools: ICTCLAS3 and Stanford Chinese word segmenter4, which are widely used in NLP related to word segmentation.
Experiment	Stanford Chinese word segmenter is a CRF-based segmentation tool and its segmentation standard is chosen as the PKU standard, which is the same to ours.
INTRODUCTION	These new features of micro-blogs make the Chinese Word Segmentation (CWS) models trained on the source domain, such as news corpus, fail to perform equally well when transferred to texts from micro-blogs.
Our method	Chinese word segmentation problem might be treated as a character labeling problem which gives each character a label indicating its position in one word.
Related Work	Recent studies show that character sequence labeling is an effective formulation of Chinese word segmentation (Low et al., 2005; Zhao et al., 2006a,b; Chen et al., 2006; Xue, 2003).
Related Work	(1998) takes advantage of the huge amount of raw text to solve Chinese word segmentation problems.
Related Work	Besides, Sun and Xu (2011) uses a sequence labeling framework, while unsupervised statistics are used as discrete features in their model, which prove to be effective in Chinese word segmentation.

Chinese word is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

3. Chinese Parsing Exploiting Characters

Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Character-based Chinese Parsing	Trained using annotated word structures, our parser also analyzes the internal structures of Chinese words .
Introduction	Frequently-occurring character sequences that express certain meanings can be treated as words, while most Chinese words have syntactic structures.
Related Work	Zhao (2009) studied character-level dependencies for Chinese word segmentation by formalizing segmentsion task in a dependency parsing framework.
Related Work	They use it as a joint framework to perform Chinese word segmentation, POS tagging and syntax parsing.
Related Work	They exploit a generative maximum entropy model for character-based constituent parsing, and find that POS information is very useful for Chinese word segmentation, but high-level syntactic information seems to have little effect on segmentation.
Word Structures and Syntax Trees	Unlike alphabetical languages, Chinese characters convey meanings, and the meaning of most Chinese words takes roots in their character.
Word Structures and Syntax Trees	Chinese words have internal structures (Xue, 2001; Ma et al., 2012).
Word Structures and Syntax Trees	Zhang and Clark (2010) found that the first character in a Chinese word is a useful indicator of the word’s POS.

Chinese word is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

4. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We will also report the conversion error rate (ConvER) proposed by (Zheng et al., 2011a), which is the ratio of the number of mistyped pinyin word that is not converted to the right Chinese word over the total number of mistyped pinyin words3.
Experiments	According to our empirical observation, emission probabilities are mostly 1 since most Chinese words have unique pronunciation.
Experiments	, 201 1a) performed an experiment that 2,000 sentences of 11,968 Chinese words were entered by 5 native speakers.
Introduction	However, every Chinese word inputted into computer or cellphone cannot be typed through one-to-one mapping of key-to-letter inputting directly, but has to go through an IME as there are thousands of Chinese characters for inputting while only 26 letter keys are available in the keyboard.
Pinyin Input Method Model	Without word delimiters, linguists have argued on what a Chinese word really is for a long time and that is why there is always a primary word segmentation treatment in most Chinese language processing tasks (Zhao et al., 2006; Huang and Zhao, 2007; Zhao and Kit, 2008; Zhao et al., 2010; Zhao and Kit, 2011; Zhao et al., 2013).
Pinyin Input Method Model	A Chinese word may contain from 1 to over 10 characters due to different word segmentation conventions.
Pinyin Input Method Model	Nevertheless, pinyin syllable segmentation is a much easier problem compared to Chinese word segmentation.

Chinese word is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

5. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a joint model for Chinese word segmentation and new word detection.
Introduction	The major problem of Chinese word segmentation is the ambiguity.
Introduction	In this paper, we present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling of Chinese word segmentation (CWS) and new word detection (NWD).
Introduction	0 We propose a joint model for Chinese word segmentation and new word detection.
Related Work	Conventional approaches to Chinese word segmentation treat the problem as a sequential labeling task (Xue, 2003; Peng et al., 2004; Tseng et al., 2005; Asahara et al., 2005; Zhao et al., 2010).
System Architecture	This phenomenon will also undermine the performance of Chinese word segmentation.
System Architecture	The B, I, E labels have been widely used in previous work of Chinese word segmentation (Sun et al., 2009b).
System Architecture	_ We used benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff to test our proposals.

Chinese word is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

6. Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves significant improvement on a series of testing sets from different domains, even with a single classifier and local features.
Conclusion and Future Work	Experiments on Chinese word segmentation show that, the enhanced word segmenter achieves significant improvement on testing sets of different domains, although using a single classifier with only local features.
Experiments	We use the Penn Chinese Treebank 5.0 (CTB) (Xue et al., 2005) as the existing annotated corpus for Chinese word segmentation.
Experiments	Table 4: Comparison with state-of-the-art work in Chinese word segmentation.
Experiments	Table 4 shows the comparison with other work in Chinese word segmentation.
Introduction	Taking Chinese word segmentation for example, the state-of-the-art models (Xue and Shen, 2003; Ng and Low, 2004; Gao et al., 2005; Nakagawa and Uchimoto, 2007; Zhao and Kit, 2008; J iang et al., 2009; Zhang and Clark, 2010; Sun, 2011b; Li, 2011) are usually trained on human-annotated corpora such as the Penn Chinese Treebank (CTB) (Xue et al., 2005), and perform quite well on corresponding test sets.
Introduction	In the rest of the paper, we first briefly introduce the problems of Chinese word segmentation and the character classification model in section
Related Work	Li and Sun (2009) extracted character classification instances from raw text for Chinese word segmentation, resorting to the indication of punctuation marks between characters.
Related Work	Sun and Xu (Sun and Xu, 2011) utilized the features derived from large-scaled unlabeled text to improve Chinese word segmentation.

Chinese word is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

7. Chinese Morphological Analysis with Character-level POS Tagging

Shen, Mo and Liu, Hongxiao and Kawahara, Daisuke and Kurohashi, Sadao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters.
Conclusion	A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-speech Tagging.
Conclusion	Word Lattice Reranking for Chinese Word Segmentation
Conclusion	An Error—Driven Word—Character Hybird Model for Joint Chinese Word Segmentation and POS Tagging.
Introduction	In recent years, the focus of research on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words toward characters.

Chinese word is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

8. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.
Abstract	In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation.
Abstract	For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging.

Chinese word is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

9. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parsing: Baseline	In addition, there is often a close connection between the meaning of a Chinese word and its first or last character.
Exploiting the Translated Treebank	Chinese word should be strictly segmented according to the guideline before POS tags and dependency relations are annotated.
Exploiting the Translated Treebank	English treebank is translated into Chinese word by word, Chinese words in the translated text are exactly some entries from the bilingual lexicon, they are actually irregular phrases, short sentences or something else rather than words that follows any existing word segmentation convention.
Treebank Translation and Dependency Transformation	Translate the PTB text into Chinese word by word.
Treebank Translation and Dependency Transformation	After the target sentence is generated, the attached POS tags and dependency information of each English word will also be transferred to each corresponding Chinese word .
Treebank Translation and Dependency Transformation	Although we try to perform an exact word-by-word translation, this aim cannot be fully reached in fact, as the following case is frequently encountered, multiple English words have to be translated into one Chinese word .

Chinese word is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. New Word Detection for Sentiment Analysis

Huang, Minlie and Ye, Borui and Wang, Yichen and Chen, Haiqiang and Cheng, Junjun and Zhu, Xiaoyan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation, named entity extraction, and sentiment analysis.
Experiment	The posts were then part-of-speech tagged using a Chinese word segmentation tool named ICTCLAS (Zhang et al., 2003).
Introduction	Automatic extraction of new words is indispensable to many tasks such as Chinese word segmentation, machine translation, named entity extraction, question answering, and sentiment analysis.
Introduction	New word detection is one of the most critical issues in Chinese word segmentation.
Introduction	Statistics show that more than 1000 new Chinese words appear every
Methodology	Obviously, in order to obtain the value of 3(wi), some particular Chinese word segmentation tool is required.

Chinese word is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

11. Omni-word Feature and Soft Constraint for Chinese Relation Extraction

Chen, Yanping and Zheng, Qinghua and Zhang, Wei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Both Omni-word feature and soft constraint make a better use of sentence information and minimize the influences caused by Chinese word segmentation and parsing.
Feature Construction	Furthermore, for a single Chinese word , occurrences of 4 characters are frequent.
Feature Construction	First, the specificity of Chinese word-formation indicates that the subphrases of Chinese word (or phrase) are also informative.
Introduction	The difficulty of Chinese IE is that Chinese words are written next to each other without delimiter in between.
Introduction	Lacking of orthographic word makes Chinese word segmentation difficult.
Related Work	(2008; 2010) also pointed out that, due to the inaccuracy of Chinese word segmentation and parsing, the tree kernel based approach is inappropriate for Chinese relation extraction.

Chinese word is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

12. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

About Heterogeneous Annotations	For Chinese word segmentation and POS tagging, supervised learning has become a dominant paradigm.
About Heterogeneous Annotations	Take Chinese word segmentation for example.
Abstract	We address the issue of consuming heterogeneous annotation data for Chinese word segmentation and part-of-speech tagging.
Conclusion	Our theoretical and empirical analysis of two representative popular corpora highlights two essential characteristics of heterogeneous annotations which are eXplored to reduce approximation and estimation errors for Chinese word segmentation and POS tagging.
Experiments	Previous studies on joint Chinese word segmentation and POS tagging have used the CTB in experiments.
Introduction	This paper explores heterogeneous annotations to reduce both approximation and estimation errors for Chinese word segmentation and part-of-speech (POS) tagging, which are fundamental steps for more advanced Chinese language processing tasks.

Chinese word is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

13. Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

Li, Peifeng and Zhu, Qiaoming and Zhou, Guodong

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimentation	Finally, all the sentences in the corpus are divided into words using a Chinese word segmentation tool (ICTCLAS)1 with all entities annotated in the corpus kept.
Inferring Inter-Sentence Arguments on Relevant Event Mentions	The second issue is that the Chinese word order in a sentence is rather agile for the open
Inferring Inter-Sentence Arguments on Relevant Event Mentions	(2012a) find out that sometimes two trigger mentions are within a Chinese word whose morphological structure is Coordination.
Inferring Inter-Sentence Arguments on Relevant Event Mentions	The relation between those event mentions whose triggers merge a Chinese word or share the subject and the object are Parallel.

Chinese word is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

14. Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

Wang, Aobo and Kan, Min-Yen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation, we propose to model the two tasks jointly.
Conclusion	There is a close dependency between Chinese word segmentation (CWS) and informal word recognition (IWR).
Introduction	This example illustrates the mutual dependency between Chinese word segmentation (henceforth, CWS) and informal word recognition (IWR) that should be solved jointly.
Methodology	Given an input Chinese microblog post, our method simultaneously segments the sentences into words (the Chinese Word Segmentation, CWS, task), and marks the component words as informal or formal ones (the Informal Word Re-congition, IWR, task).
Methodology	0 (*)IkaCk+1(i—4 < k < i+4)isnota Chinese word recorded in dictionaries: CPMI—N@k+i; CPMI—M@k+i; CDifl@k+i; PYPMI—N@k+i; PYPMI—M@k+i; PYD-ifi”@k+i

Chinese word is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

SVM (13)
baseline systems (11)
CRF (11)

15. Character-Level Chinese Dependency Parsing

Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Character-Level Dependency Tree	The results demonstrate that the structures of Chinese words are not difficult to predict, and confirm the fact that Chinese word structures have some common syntactic patterns.
Character-Level Dependency Tree	Zhao (2009) was the first to study character-level dependencies; they argue that since no consistent word boundaries exist over Chinese word segmentation, dependency-based representations of word structures serve as a good alternative for Chinese word segmentation.
Character-Level Dependency Tree	(2012) proposed a joint model for Chinese word segmentation, POS-tagging and dependency parsing, studying the influence of joint model and character features for parsing, Their model is extended from the arc-standard transition-based model, and can be regarded as an alternative to the arc-standard model of our work when pseudo intra-word dependencies are used.
Introduction	First, character-level trees circumvent the issue that no universal standard exists for Chinese word segmentation.
Introduction	In the well-known Chinese word segmentation bakeoff tasks, for example, different segmentation standards have been used by different data sets (Emerson, 2005).

Chinese word is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

16. Word Alignment Modeling with Context Dependent Deep Neural Network

Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	By analyzing the results, we found out that for both baseline and our model, a large part of missing alignment links involves stop words like English words “the”, “a”, “it” and Chinese words “de”.
Experiments and Results	As Chinese language lacks morphology, the single form and plural form of a noun in English often correspond to the same Chinese word , thus it is desirable that the two English words should have similar word embeddings.
Introduction	As shown in example (a) of Figure 1, in word pair {“juda” =>“mammot ”}, the Chinese word “juda” is a common word, but
Introduction	For example (b) in Figure l, for the word pair {“yibula” => “Yibula”}, both the Chinese word “yibula” and English word “Yibula” are rare name entities, but the words around them are very common, which are {“nongmin”, “shuo”} for Chinese side and {“farmer”, “said”} for the English side.
Training	For example, many Chinese words can act as a verb, noun and adjective without any change, while their English counter parts are distinct words with quite different word embeddings due to their different syntactic roles.

Chinese word is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

17. Multilingual Harvesting of Cross-Cultural Stereotypes

Veale, Tony and Hao, Yanfen and Li, Guofu

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Harvesting Knowledge from Similes: English and Chinese	HowNet is a bilingual lexical ontology that associates English and Chinese word labels with an underlying set of approximately 100,000 lexical concepts.
Tagging and Mapping of Similes	Thus, the Chinese word “% 4?.” can translate as “celebrated”, “famous”, “well-known” and “reputable”, but all four of these possible senses, given by celebrated\|%%, famous\|§ 4%, well-known\|% 4?.
Tagging and Mapping of Similes	For while the words used in any given simile are likely to be ambiguous (in the case of one-character Chinese words , highly so), it would seem unlikely that an incorrect translation of a web simile would also be found on the web.
Tagging and Mapping of Similes	One significant reason for this kind of omission is not cultural difference, but obviousness: many Chinese words are multi-character gestalts of different ideas (see Packard, 2000), so that these ideas form an explicit part of the orthography of a lexical concept.

Chinese word is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

18. Pseudo-Word for Phrase-Based Machine Translation

Duan, Xiangyu and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	It seems that Chinese word prefers to have English pseudo-word equivalence which has more than or equal to one word.
Experiments and Results	Similar to performances on small corpus, wdpwen always performs better than the other two cases, which indicates that Chinese word prefers to have English pseudo-word equivalence which has more than or equal to one word.
Introduction	In Chinese-to-English translation task Where Chinese word boundaries are not marked, Xu et al.
Introduction	(2008) used a Bayesian semi-supervised method that combines Chinese word segmentation model and Chinese-to-English translation model to derive a Chinese segmentation suitable for machine translation.
Introduction	2009), only focusing on Chinese word as basic translational unit is not adequate to model I-to-n translations.

Chinese word is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

19. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging -- A Case Study

Jiang, Wenbin and Huang, Liang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese.
Introduction	To test the efficacy of our method we choose Chinese word segmentation and part-of-speech tagging, where the problem of incompatible annotation standards is one of the most evident: so far no segmentation standard is widely accepted due to the lack of a clear definition of Chinese words , and the (almost complete) lack of morphology results in much bigger ambiguities and heavy debates in tagging philosophies for Chinese parts-of-speech.
Segmentation and Tagging as Character Classification	where each subsequence Cm- indicates a Chinese word spanning from characters 0,- to Cj (both in-
Segmentation and Tagging as Character Classification	Xue and Shen (2003) describe for the first time the character classification approach for Chinese word segmentation, Where each character is given a boundary tag denoting its relative position in a word.
Segmentation and Tagging as Character Classification	It is an online training algorithm and has been successfully used in many NLP tasks, such as POS tagging (Collins, 2002), parsing (Collins and Roark, 2004), Chinese word segmentation (Zhang and Clark, 2007; J iang et al., 2008), and so on.

Chinese word is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

20. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging.
Conclusion	In this paper, we presented a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging.
Experiments	Previous studies on joint Chinese word segmentation and POS tagging have used Penn Chinese Treebank (CTB) (Xia et al., 2000) in experiments.
Related work	For example, a perceptron algorithm is used for joint Chinese word segmentation and POS tagging (Zhang and Clark, 2008; Jiang et al., 2008a; Jiang et al., 2008b).

Chinese word is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

21. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This study investigates on building a better Chinese word segmentation model for statistical machine translation.
Experiments	All other nine CWS models outperforms the CS baseline which does not try to identify Chinese words at all.
Introduction	They leverage such mappings to either constitute a Chinese word dictionary for maximum-matching segmentation (Xu et al., 2004), or form labeled data for training a sequence labeling model (Paul et al., 2011).
Introduction	This paper proposes an alternative Chinese Word Segmentation (CWS) model adapted to the SMT task, which seeks not only to maintain the advantages of a monolingual supervised model, having hand-annotated linguistic knowledge, but also to assimilate the relevant bilingual segmenta-

Chinese word is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

22. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging.
Introduction	As far as we know, however, these methods have not yet been applied to resolve the problem of joint Chinese word segmentation (CWS) and POS tagging.
Method	This study introduces a novel semi-supervised approach for joint Chinese word segmentation and POS tagging.

Chinese word is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Enlisting the Ghost: Modeling Empty Categories for Machine Translation

Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	We first predict pro and PRO with our annotation model for all Chinese sentences in the parallel training data, with pro and PRO inserted between the original Chinese words .
Integrating Empty Categories in Machine Translation	One of the other frequent ECs, OP , appears in the Chinese relative clauses, which usually have a Chinese word “De” aligned to the target side “that” or “which”.
Introduction	Consequently, “that” is incorrectly aligned to the second to the last Chinese word “De”, due to their high co-occurrence frequency in the training data.

Chinese word is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

24. Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

Liu, Chang and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Future Work	Chinese word segmentation.
Experiments	We use the Stanford Chinese word segmenter (Tseng et al., 2005) and POS tagger (Toutanova et al., 2003) for preprocessing and Cilin for synonym
Experiments	In all our experiments here we use TESLA-CELAB with n- grams for 77. up to four, since the vast majority of Chinese words , and therefore synonyms, are at most four characters long.

Chinese word is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word-level (15)
n-grams (12)

25. A Chinese-English Organization Name Translation System Using Heuristic Web Mining and Asymmetric Alignment

Yang, Fan and Zhao, Jun and Liu, Kang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Asymmetric Alignment Method for Equivalent Extraction	A Chinese NE C0={CW1, CW2, ..., CWn} is a sequence of Chinese words CW, and the English
Introduction	The selection of the Chinese words to be translated will take into consideration both the translation confidence of the words and the information contents that they contain for the whole ON.
The Chunking-based Segmentation for Chinese ONs	When Chinese words are aligned with English words, the mistakes made in Chinese segmentation may result in wrong alignment results.

Chinese word is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

26. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and future work	Since parsing of the source is relatively inexpensive compared to the target side, it would be relatively easy to condition head-modifier dependencies not only on the two target words, but also on their corresponding Chinese words and their relative positions in the Chinese tree.
Machine translation experiments	Chinese words drawn from various news parallel corpora distributed by the Linguistic Data Consortium (LDC).
Machine translation experiments	Chinese words were automatically segmented with a conditional random field (CRF) classifier (Chang et al., 2008) that conforms to the Chinese Treebank (CTB) standard.

Chinese word is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: