Index of papers in Proc. ACL that mention
  • Chinese-English
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie
Abstract
This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT.
Conclusion
In this paper, we introduced a novel pre-ordering approach based on dependency parsing for a Chinese-English PBSMT system.
Conclusion
These results indicated that dependency parsing is more effective for conducting pre-ordering for Chinese-English PBSMT.
Experiments
Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
Introduction
Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs, such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010).
Introduction
The purpose of this paper is to introduce a novel dependency-based pre-ordering approach through creating a pre-ordering rule set and applying it to the Chinese-English PBSMT system.
Introduction
To our knowledge, our manually created pre-ordering rule set is the first Chinese-English dependency-based pre-ordering rule set.
Chinese-English is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Abstract
We achieve significant improvement over both Hiero and phrase-based baselines for Arabic-English, Chinese-English and German-English translation.
Experiment Results
We will report the impact of integrating phrase-based features into Hiero systems for three language pairs: Arabic-English, Chinese-English and German-English.
Experiment Results
4.3 Chinese-English Results
Experiment Results
The Chinese-English system was trained on FBIS corpora of 384K sentence pairs, the English corpus is lower case.
Introduction
In our Chinese-English experiment, the Hiero system still outperforms the discontinuous phrase-based system.
Introduction
(2008) added structure distortion features into their decoder and showed improvements in their Chinese-English experiment.
Phrasal-Hiero Model
In the experiment section, we will discuss the impact of removing rules with nonaligned sub-phrases in our German-English and Chinese-English experiments.
Chinese-English is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Discussion
We make a detailed analysis on the Chinese-English translation results that are affected by our paraphrase rules.
Discussion
The analysis is carried out on the IWSLT 2007 Chinese-English test set, 84 out of 489 input sentences have been affected by paraphrases, and the statistic of human evaluation is shown in Table 8.
Experiments
The experiments were conducted in both Chinese-English and English-Chinese directions for the oral group, and Chinese-English direction for the news group.
Experiments
corpora, including the Chinese-English Sentence Aligned Bilingual Corpus (CLDC-LAC-2003-004) and the Chinese-English Parallel Corpora (CLDC-LAC-2003-006).
Experiments
For testing and developing, we used six Chinese-English development corpora of IWSLT 2008.
Chinese-English is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Introduction
We also show strong improvements on the NIST OpenMT12 Chinese-English task, as well as the DARPA BOLT (Broad Operational Language Translation) Arabic-English and Chinese-English conditions.
Model Variations
We present MT primary results on Arabic-English and Chinese-English for the NIST OpenMT12 and DARPA BOLT conditions.
Model Variations
Table 3: Primary results on Arabic-English and Chinese-English NIST MT12 Test Set.
Model Variations
For the Chinese-English condition, there is an improvement of +0.8 BLEU from the primary NNJM and +1.3 BLEU overall.
Neural Network Joint Model (NNJ M)
An example of the NNJ M context model for a Chinese-English parallel sentence is given in Figure 1.
Chinese-English is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Abstract
We evaluate our optimizer on Chinese-English and Arabic-English translation tasks, each with small and large feature sets, and show that our learner is able to achieve significant improvements of 1.2-2 BLEU and 1.7-4.3 TER on average over state-of-the-art optimizers with the large feature set.
Additional Experiments
In both Arabic-English feature sets, MIRA seems to take the second place, while RAMPION lags behind, unlike in Chinese-English (§4).6
Discussion
Spread analysis: For RM, the average spread of the projected data in the Chinese-English small feature set was 0.9i3.6 for all tuning iterations, and 0.7j:2.9 for the iteration with the highest decoder performance.
Experiments
To evaluate the advantage of explicitly accounting for the spread of the data, we conducted several experiments on two Chinese-English translation test sets, using two different feature sets in each.
Experiments
As can be seen from the results in Table 3, our RM method was the best performer in all Chinese-English tests according to all measures — up to 1.9 BLEU and 6.6 TER over MIRA — even though we only optimized for BLEU.5 Surprisingly, it seems that MIRA did not benefit as much from the sparse features as RM.
Experiments
In preliminary experiments with a smaller trigram LM, our RM method consistently yielded the highest scores in all Chinese-English tests — up to 1.6 BLEU and 6.4 TER from MIRA, the second best performer.
Introduction
Chinese-English translation experiments show that our algorithm, RM, significantly outperforms strong state-of-the-art optimizers, in both a basic feature setting and high-dimensional (sparse) feature space (§4).
Chinese-English is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei
Abstract
Additionally, we remove low confidence alignment links from the word alignment of a bilingual training corpus, which increases the alignment F-score, improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size.
Conclusion
lected among multiple alignments and it obtained 0.8 F-measure improvement over the single best Chinese-English aligner.
Conclusion
When we removed low confidence links from the MaXEnt aligner, we reduced the Chinese-English alignment error by 5% and the Arabic-English alignment error by 10%.
Improved MaXEnt Aligner with Confidence-based Link Filtering
We applied the confidence-based link filtering on Chinese-English and Arabic-English word alignment.
Sentence Alignment Confidence Measure
We randomly selected 512 Chinese-English (CE) sentence pairs and generated word alignment using the MaxEnt aligner (Ittycheriah and Roukos, 2005).
Translation
We evaluate the improved alignment on several Chinese-English and Arabic-English machine translation tasks.
Translation
In the Chinese-English MT experiment, we selected 40 NW documents, 41 WE documents as the test set, which includes 623 sentences with 16667 words.
Chinese-English is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Larkin, Samuel
BLEU and PORT
For our experiments, we tuned a on Chinese-English data, setting it to 0.25 and keeping this value for the other language pairs.
Experiments
The large data condition uses training data from NIST3 2009 ( Chinese-English track).
Experiments
We are currently investigating why PORT tuning gives higher BLEU scores than BLEU tuning for Chinese-English and German-English.
Experiments
PORT outperforms Qmean on seven of the eight automatic scores shown for small and large Chinese-English .
Chinese-English is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Abstract
Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baselinel.
Baseline MT
As our baseline, we apply a high-performing Chinese-English MT system (Zheng, 2008; Zheng et al., 2009) based on hierarchical phrase-based translation framework (Chiang, 2005).
Conclusions and Future Work
Experiments on Chinese-English translation demonstrated the effectiveness of our approach over a high-quality MT baseline in both overall translation and name translation, especially for formal genres.
Experiments
We used a large Chinese-English MT training corpus from various sources and genres (including newswire, web text, broadcast news and broadcast conversations) for our experiments.
Experiments
get side of Chinese-English and Egyptian Arabic-English parallel text, English monolingual discussion forums data Rl-R4 released in BOLT Phase 1 (LDC2012E04, LDC2012E16, LDC2012E21, LDC2012E54), and English Gigaword Fifth Edition (LDC2011T07).
Experiments
We conducted the experiment on the Chinese-English Parallel Treebank (Li et a1., 2010) with ground-truth word alignment.
Chinese-English is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Vaswani, Ashish and Huang, Liang and Chiang, David
Experiments
To demonstrate the effect of the {O-norm on the IBM models, we performed experiments on four translation tasks: Arabic-English, Chinese-English , and Urdu-English from the NIST Open MT Evaluation, and the Czech-English translation from the Workshop on Machine Translation (WMT) shared task.
Experiments
0 Chinese-English : selected data from the constrained task of the NIST 2009 Open MT Evaluation.3
Experiments
For Arabic-English and Chinese-English , we used 346 and 184 hand-aligned sentences from LDC2006E86 and LDC2006E93.
Chinese-English is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Abstract
We have been able to extract over 1M Chinese-English parallel segments from Sina Weibo (the Chinese counterpart of Twitter) using only their public APIs.
Experiments
For the news test, we created a new test set from a crawl of the Chinese-English documents on the Project Syndicate website2, which contains news commentary articles.
Experiments
Second, we use the full 2012 NIST Chinese-English dataset (approximately 8M sentence pairs, including FBIS).
Introduction
Section 5 describes the data we gathered from both Sina Weibo (Chinese-English) and Twitter ( Chinese-English and Arabic-English).
Parallel Data Extraction
The target domains in this work are Twitter and Sina Weibo, and the main language pair is Chinese-English .
Parallel Data Extraction
This means that for the Chinese-English language pair, we only keep tweets with more than 3 Mandarin characters and 3 latin words.
Chinese-English is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Experimental Results
Speed Ratio 29 142 Chinese-English Objective Hiero SBMT
Experimental Results
We evaluated on both Chinese-English and Arabic-English translation tasks.
Experimental Results
For the Chinese-English experiments, we used 260 million words of word-aligned parallel text; the hierarchical system used all of this data, and the syntax-based system used a 65-million word subset.
Chinese-English is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Abstract
As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
Introduction
We evaluate our method on the NIST Chinese-English translation datasets.
Introduction
1Empirically, we find that the average number of stacks for J words is about 1.5 X J on the Chinese-English data.
Introduction
We evaluated our phrase-based string-to-dependency translation system on Chinese-English translation.
Chinese-English is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Zou, Bo and Liu, Kang and Liu, Feifan
Introduction*
We can’t get Chinese-English bilingual pages when the input is a Chinese query.
Statistical Transliteration Model
We use syllables as translation units to build a statistical Chinese-English backward transliteration model in our system.
Statistical Transliteration Model
Based on the above alignment method, we can get our statistical Chinese-English backward transliteration model as,
Statistical Transliteration Model
Chinese-English backward transliteration has some differences from traditional translation.
Chinese-English is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Experiments
The first experiments are on the IWSLT data set for Chinese-English translation.
Experiments
Table 3: Machine translation performance in BLE U % on the IWSLT 2005 Chinese-English test set.
Experiments
To test whether our improvements carry over to larger datasets, we assess the performance of our model on the FBIS Chinese-English data set.
Introduction
We demonstrate our model on Chinese-English and Arabic-English translation datasets.
Chinese-English is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Abstract
On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
Experiments and Results
We now test our DAE features on the following two Chinese-English translation tasks.
Experiments and Results
The bilingual corpus is the Chinese-English part of Basic Traveling Expression corpus (BTEC) and China-Japan-Korea (CJK) corpus (0.38M sentence pairs with 3.5/3.8M Chi-nese/English words).
Introduction
Finally, we conduct large-scale experiments on IWSLT and NIST Chinese-English translation tasks, respectively, and the results demonstrate that our solutions solve the two aforementioned shortcomings successfully.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Peng, Nanyun and Wang, Yiming and Dredze, Mark
Code-Switching
In this work we consider two types of code-switched documents: single messages and conversations, and two language pairs: Chinese-English and Spanish-English.
Code-Switching
An example of a Chinese-English code-switched messages is given by Ling et al.
Code-Switching
We used two datasets: a Sina Weibo Chinese-English corpus (Ling et al., 2013) and a Spanish-English Twitter corpus.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang
Abstract
We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
Evaluation
We experimented with our approach on Chinese-English translation using the NiuTrans open-source MT toolkit (Xiao et al., 2012).
Evaluation
It contains the annotation of sentence skeleton on the Chinese-language side of the Penn Parallel Chinese-English Treebank (LD-C2003E07).
Introduction
0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Macherey, Klaus
Conclusion
The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
Experimental Results
We evaluated alignment quality on a hand-aligned portion of the NIST 2002 Chinese-English test set (Ayan and Dorr, 2006).
Experimental Results
AER results for Chinese-English are reported in Table 2.
Introduction
Our model-based approach to aligner combination yields improvements in alignment quality and phrase extraction quality in Chinese-English experiments, relative to typical heuristic combinations methods applied to the predictions of independent directional models.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Zhang, Min and Li, Haizhou and Aw, Aiti and Tan, Chew Lim
Abstract
Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems.
Conclusion
Finally, we examine our methods on the FBIS corpus and the NIST MT-2003 Chinese-English translation task.
Experiment
We evaluate our method on Chinese-English translation task.
Introduction
We evaluate our method on the NIST MT-2003 Chinese-English translation tasks.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Liu, Kang
Conclusion
It proves that our system can work well on the Chinese-English ON translation task.
Experiments
Compared with the statistical ON translation model, we can see that the performance is improved from 18.29% to 48.71% (the bold data shown in column 1 and column 3 of Table 5) by using our Chinese-English ON translation system.
Heuristic Query Construction
In order to use the web information to assist Chinese-English ON translation, we must firstly retrieve the bilingual web pages effectively.
Introduction
For solving these two problems, we propose a Chinese-English organization name translation system using heuristic web mining and asymmetric alignment, which has three innovations.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Experiments
Table 2 describes the data used for model training in this paper, including the BTEC (Basic Travel Expression Corpus) Chinese-English (CE) corpus and the BTEC English-Spanish (ES) corpus provided by IWSLT 2008 organizers, the HIT olympic CE corpus (2004-863-008)1 and the Europarl ES corpusz.
Experiments
For Chinese-English translation, we mainly used BTEC CE1 corpus.
Experiments
We used two commercial RBMT systems in our experiments: System A for Chinese-English bidirectional translation and System B for English-Chinese and English-Spanish translation.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Shen, Libin and Xu, Jinxi and Weischedel, Ralph
Conclusions and Future Work
Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set.
Experiments
We used part of the NIST 2006 Chinese-English large track data as well as some LDC corpora collected for the DARPA GALE program (LDC2005E83, LDC2006E34 and LDC2006G05) as our bilingual training data.
Introduction
For example, Chiang (2007) showed that the Hiero system achieved about 1 to 3 point improvement in BLEU on the NIST 03/04/05 Chinese-English evaluation sets compared to a start-of-the-art phrasal system.
Introduction
Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set.
Chinese-English is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.
Abstract
Large-scale experiments on Arabic-English and Chinese-English show that our method produces significant translation quality gains by exploiting sparse features.
Experiments
We built Arabic-English and Chinese-English MT systems with Phrasal (Cer et al., 2010), a phrase-based system based on alignment templates (Och and Ney, 2004).
Introduction
We conduct large-scale translation quality experiments on Arabic-English and Chinese-English .
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao
Abstract
In our experiments, our model improved 2.9 BLEU points for J apanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models.
Experiment
Japanese-English Chinese-English HIER 30.47 32.66
Introduction
Experiments confirmed the effectiveness of our method for J apanese-English and Chinese-English translation, using NTCIR-9 Patent Machine Translation Task data sets (Goto et al., 2011).
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef
Conclusion
Experimental results on Chinese-English translation across four test sets demonstrate significant improvements of the HD-HPB model over both Chiang’s HPB and a source-side SAMT—style refined version of HPB.
Introduction
Figure 1: An example word alignment for a Chinese-English sentence pair with the dependency parse tree for the Chinese sentence.
Introduction
Experiments on Chinese-English translation using four NIST MT test sets show that our HD-HPB model significantly outperforms Chiang’s HPB as well as a SAMT—style refined version of HPB.
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin
Experiments
The system is trained on 10 million parallel sentences that are available to the Phase 1 of the DARPA BOLT Chinese-English MT task.
Training
For our Chinese-English experiments, we use a simple heuristic that equates as anchors, single-word chunks whose corresponding word class belongs to closed-word classes, bearing a close resemblance to (Setiawan et al., 2007).
Two-Neighbor Orientation Model
Figure 1: An aligned Chinese-English sentence pair.
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhigang and Li, Zhixing and Li, Juanzi and Tang, Jie and Z. Pan, Jeff
Conclusion and Future Work
Chinese-English experimental results on four typical attributes showed that WikiCiKE significantly outperforms both the current translation based methods and the monolingual extraction methods.
Experiments
In this section, we present our experiments to evaluate the effectiveness of WikiCiKE, where we focus on the Chinese-English case; in other words, the target language is Chinese and the source language is English.
Introduction
Chinese-English experiments for four typical attributes demonstrate that WikiCiKE outperforms both the monolingual extraction method and current translation-based method.
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Abstract
Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system.
Conclusion and Future Work
Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system.
Experiments
In this section, we test its effectiveness in Chinese-English translation.
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Min and Jiang, Hongfei and Aw, Aiti and Li, Haizhou and Tan, Chew Lim and Li, Sheng
Abstract
Experimental results on the NIST MT-2005 Chinese-English translation task show that our method statistically significantly outperforms the baseline systems.
Conclusions and Future Work
The experimental results on the NIST MT-2005 Chinese-English translation task demonstrate the effectiveness of the proposed model.
Introduction
Experiment results on the NIST MT-2005 Chinese-English translation task show that our method significantly outperforms Moses (Koehn et al., 2007), a state-of-the-art phrase-based SMT system, and other linguistically syntax-based methods, such as SCFG-based and STSG-based methods (Zhang et al., 2007).
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
Conclusions
Our experimental results on IWSLT Chinese-English corpus have demonstrated consistent and significant improvement over the widely used word alignment matrix based extraction method.
Experimental Results
We do experiments on IWSLT (Paul, 2006) 2006 Chinese-English corpus.
Experimental Results
The training corpus consists of 40K Chinese-English parallel sentences in travel domain with to-
Chinese-English is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: