Abstract | This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT. |
Conclusion | In this paper, we introduced a novel pre-ordering approach based on dependency parsing for a Chinese-English PBSMT system. |
Conclusion | These results indicated that dependency parsing is more effective for conducting pre-ordering for Chinese-English PBSMT. |
Experiments | Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs. |
Introduction | Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs, such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010). |
Introduction | The purpose of this paper is to introduce a novel dependency-based pre-ordering approach through creating a pre-ordering rule set and applying it to the Chinese-English PBSMT system. |
Introduction | To our knowledge, our manually created pre-ordering rule set is the first Chinese-English dependency-based pre-ordering rule set. |
Abstract | We achieve significant improvement over both Hiero and phrase-based baselines for Arabic-English, Chinese-English and German-English translation. |
Experiment Results | We will report the impact of integrating phrase-based features into Hiero systems for three language pairs: Arabic-English, Chinese-English and German-English. |
Experiment Results | 4.3 Chinese-English Results |
Experiment Results | The Chinese-English system was trained on FBIS corpora of 384K sentence pairs, the English corpus is lower case. |
Introduction | In our Chinese-English experiment, the Hiero system still outperforms the discontinuous phrase-based system. |
Introduction | (2008) added structure distortion features into their decoder and showed improvements in their Chinese-English experiment. |
Phrasal-Hiero Model | In the experiment section, we will discuss the impact of removing rules with nonaligned sub-phrases in our German-English and Chinese-English experiments. |
Discussion | We make a detailed analysis on the Chinese-English translation results that are affected by our paraphrase rules. |
Discussion | The analysis is carried out on the IWSLT 2007 Chinese-English test set, 84 out of 489 input sentences have been affected by paraphrases, and the statistic of human evaluation is shown in Table 8. |
Experiments | The experiments were conducted in both Chinese-English and English-Chinese directions for the oral group, and Chinese-English direction for the news group. |
Experiments | corpora, including the Chinese-English Sentence Aligned Bilingual Corpus (CLDC-LAC-2003-004) and the Chinese-English Parallel Corpora (CLDC-LAC-2003-006). |
Experiments | For testing and developing, we used six Chinese-English development corpora of IWSLT 2008. |
Introduction | We also show strong improvements on the NIST OpenMT12 Chinese-English task, as well as the DARPA BOLT (Broad Operational Language Translation) Arabic-English and Chinese-English conditions. |
Model Variations | We present MT primary results on Arabic-English and Chinese-English for the NIST OpenMT12 and DARPA BOLT conditions. |
Model Variations | Table 3: Primary results on Arabic-English and Chinese-English NIST MT12 Test Set. |
Model Variations | For the Chinese-English condition, there is an improvement of +0.8 BLEU from the primary NNJM and +1.3 BLEU overall. |
Neural Network Joint Model (NNJ M) | An example of the NNJ M context model for a Chinese-English parallel sentence is given in Figure 1. |
Abstract | We evaluate our optimizer on Chinese-English and Arabic-English translation tasks, each with small and large feature sets, and show that our learner is able to achieve significant improvements of 1.2-2 BLEU and 1.7-4.3 TER on average over state-of-the-art optimizers with the large feature set. |
Additional Experiments | In both Arabic-English feature sets, MIRA seems to take the second place, while RAMPION lags behind, unlike in Chinese-English (§4).6 |
Discussion | Spread analysis: For RM, the average spread of the projected data in the Chinese-English small feature set was 0.9i3.6 for all tuning iterations, and 0.7j:2.9 for the iteration with the highest decoder performance. |
Experiments | To evaluate the advantage of explicitly accounting for the spread of the data, we conducted several experiments on two Chinese-English translation test sets, using two different feature sets in each. |
Experiments | As can be seen from the results in Table 3, our RM method was the best performer in all Chinese-English tests according to all measures — up to 1.9 BLEU and 6.6 TER over MIRA — even though we only optimized for BLEU.5 Surprisingly, it seems that MIRA did not benefit as much from the sparse features as RM. |
Experiments | In preliminary experiments with a smaller trigram LM, our RM method consistently yielded the highest scores in all Chinese-English tests — up to 1.6 BLEU and 6.4 TER from MIRA, the second best performer. |
Introduction | Chinese-English translation experiments show that our algorithm, RM, significantly outperforms strong state-of-the-art optimizers, in both a basic feature setting and high-dimensional (sparse) feature space (§4). |
Abstract | Additionally, we remove low confidence alignment links from the word alignment of a bilingual training corpus, which increases the alignment F-score, improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size. |
Conclusion | lected among multiple alignments and it obtained 0.8 F-measure improvement over the single best Chinese-English aligner. |
Conclusion | When we removed low confidence links from the MaXEnt aligner, we reduced the Chinese-English alignment error by 5% and the Arabic-English alignment error by 10%. |
Improved MaXEnt Aligner with Confidence-based Link Filtering | We applied the confidence-based link filtering on Chinese-English and Arabic-English word alignment. |
Sentence Alignment Confidence Measure | We randomly selected 512 Chinese-English (CE) sentence pairs and generated word alignment using the MaxEnt aligner (Ittycheriah and Roukos, 2005). |
Translation | We evaluate the improved alignment on several Chinese-English and Arabic-English machine translation tasks. |
Translation | In the Chinese-English MT experiment, we selected 40 NW documents, 41 WE documents as the test set, which includes 623 sentences with 16667 words. |
BLEU and PORT | For our experiments, we tuned a on Chinese-English data, setting it to 0.25 and keeping this value for the other language pairs. |
Experiments | The large data condition uses training data from NIST3 2009 ( Chinese-English track). |
Experiments | We are currently investigating why PORT tuning gives higher BLEU scores than BLEU tuning for Chinese-English and German-English. |
Experiments | PORT outperforms Qmean on seven of the eight automatic scores shown for small and large Chinese-English . |
Abstract | Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baselinel. |
Baseline MT | As our baseline, we apply a high-performing Chinese-English MT system (Zheng, 2008; Zheng et al., 2009) based on hierarchical phrase-based translation framework (Chiang, 2005). |
Conclusions and Future Work | Experiments on Chinese-English translation demonstrated the effectiveness of our approach over a high-quality MT baseline in both overall translation and name translation, especially for formal genres. |
Experiments | We used a large Chinese-English MT training corpus from various sources and genres (including newswire, web text, broadcast news and broadcast conversations) for our experiments. |
Experiments | get side of Chinese-English and Egyptian Arabic-English parallel text, English monolingual discussion forums data Rl-R4 released in BOLT Phase 1 (LDC2012E04, LDC2012E16, LDC2012E21, LDC2012E54), and English Gigaword Fifth Edition (LDC2011T07). |
Experiments | We conducted the experiment on the Chinese-English Parallel Treebank (Li et a1., 2010) with ground-truth word alignment. |
Experiments | To demonstrate the effect of the {O-norm on the IBM models, we performed experiments on four translation tasks: Arabic-English, Chinese-English , and Urdu-English from the NIST Open MT Evaluation, and the Czech-English translation from the Workshop on Machine Translation (WMT) shared task. |
Experiments | 0 Chinese-English : selected data from the constrained task of the NIST 2009 Open MT Evaluation.3 |
Experiments | For Arabic-English and Chinese-English , we used 346 and 184 hand-aligned sentences from LDC2006E86 and LDC2006E93. |
Abstract | We have been able to extract over 1M Chinese-English parallel segments from Sina Weibo (the Chinese counterpart of Twitter) using only their public APIs. |
Experiments | For the news test, we created a new test set from a crawl of the Chinese-English documents on the Project Syndicate website2, which contains news commentary articles. |
Experiments | Second, we use the full 2012 NIST Chinese-English dataset (approximately 8M sentence pairs, including FBIS). |
Introduction | Section 5 describes the data we gathered from both Sina Weibo (Chinese-English) and Twitter ( Chinese-English and Arabic-English). |
Parallel Data Extraction | The target domains in this work are Twitter and Sina Weibo, and the main language pair is Chinese-English . |
Parallel Data Extraction | This means that for the Chinese-English language pair, we only keep tweets with more than 3 Mandarin characters and 3 latin words. |
Experimental Results | Speed Ratio 29 142 Chinese-English Objective Hiero SBMT |
Experimental Results | We evaluated on both Chinese-English and Arabic-English translation tasks. |
Experimental Results | For the Chinese-English experiments, we used 260 million words of word-aligned parallel text; the hierarchical system used all of this data, and the syntax-based system used a 65-million word subset. |
Abstract | As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets. |
Introduction | We evaluate our method on the NIST Chinese-English translation datasets. |
Introduction | 1Empirically, we find that the average number of stacks for J words is about 1.5 X J on the Chinese-English data. |
Introduction | We evaluated our phrase-based string-to-dependency translation system on Chinese-English translation. |
Introduction* | We can’t get Chinese-English bilingual pages when the input is a Chinese query. |
Statistical Transliteration Model | We use syllables as translation units to build a statistical Chinese-English backward transliteration model in our system. |
Statistical Transliteration Model | Based on the above alignment method, we can get our statistical Chinese-English backward transliteration model as, |
Statistical Transliteration Model | Chinese-English backward transliteration has some differences from traditional translation. |
Experiments | The first experiments are on the IWSLT data set for Chinese-English translation. |
Experiments | Table 3: Machine translation performance in BLE U % on the IWSLT 2005 Chinese-English test set. |
Experiments | To test whether our improvements carry over to larger datasets, we assess the performance of our model on the FBIS Chinese-English data set. |
Introduction | We demonstrate our model on Chinese-English and Arabic-English translation datasets. |
Abstract | On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively. |
Experiments and Results | We now test our DAE features on the following two Chinese-English translation tasks. |
Experiments and Results | The bilingual corpus is the Chinese-English part of Basic Traveling Expression corpus (BTEC) and China-Japan-Korea (CJK) corpus (0.38M sentence pairs with 3.5/3.8M Chi-nese/English words). |
Introduction | Finally, we conduct large-scale experiments on IWSLT and NIST Chinese-English translation tasks, respectively, and the results demonstrate that our solutions solve the two aforementioned shortcomings successfully. |
Code-Switching | In this work we consider two types of code-switched documents: single messages and conversations, and two language pairs: Chinese-English and Spanish-English. |
Code-Switching | An example of a Chinese-English code-switched messages is given by Ling et al. |
Code-Switching | We used two datasets: a Sina Weibo Chinese-English corpus (Ling et al., 2013) and a Spanish-English Twitter corpus. |
Abstract | We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data. |
Evaluation | We experimented with our approach on Chinese-English translation using the NiuTrans open-source MT toolkit (Xiao et al., 2012). |
Evaluation | It contains the annotation of sentence skeleton on the Chinese-language side of the Penn Parallel Chinese-English Treebank (LD-C2003E07). |
Introduction | 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data. |
Conclusion | The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments. |
Experimental Results | We evaluated alignment quality on a hand-aligned portion of the NIST 2002 Chinese-English test set (Ayan and Dorr, 2006). |
Experimental Results | AER results for Chinese-English are reported in Table 2. |
Introduction | Our model-based approach to aligner combination yields improvements in alignment quality and phrase extraction quality in Chinese-English experiments, relative to typical heuristic combinations methods applied to the predictions of independent directional models. |
Abstract | Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems. |
Conclusion | Finally, we examine our methods on the FBIS corpus and the NIST MT-2003 Chinese-English translation task. |
Experiment | We evaluate our method on Chinese-English translation task. |
Introduction | We evaluate our method on the NIST MT-2003 Chinese-English translation tasks. |
Conclusion | It proves that our system can work well on the Chinese-English ON translation task. |
Experiments | Compared with the statistical ON translation model, we can see that the performance is improved from 18.29% to 48.71% (the bold data shown in column 1 and column 3 of Table 5) by using our Chinese-English ON translation system. |
Heuristic Query Construction | In order to use the web information to assist Chinese-English ON translation, we must firstly retrieve the bilingual web pages effectively. |
Introduction | For solving these two problems, we propose a Chinese-English organization name translation system using heuristic web mining and asymmetric alignment, which has three innovations. |
Experiments | Table 2 describes the data used for model training in this paper, including the BTEC (Basic Travel Expression Corpus) Chinese-English (CE) corpus and the BTEC English-Spanish (ES) corpus provided by IWSLT 2008 organizers, the HIT olympic CE corpus (2004-863-008)1 and the Europarl ES corpusz. |
Experiments | For Chinese-English translation, we mainly used BTEC CE1 corpus. |
Experiments | We used two commercial RBMT systems in our experiments: System A for Chinese-English bidirectional translation and System B for English-Chinese and English-Spanish translation. |
Conclusions and Future Work | Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set. |
Experiments | We used part of the NIST 2006 Chinese-English large track data as well as some LDC corpora collected for the DARPA GALE program (LDC2005E83, LDC2006E34 and LDC2006G05) as our bilingual training data. |
Introduction | For example, Chiang (2007) showed that the Hiero system achieved about 1 to 3 point improvement in BLEU on the NIST 03/04/05 Chinese-English evaluation sets compared to a start-of-the-art phrasal system. |
Introduction | Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set. |
Abstract | Large-scale experiments on Arabic-English and Chinese-English show that our method produces significant translation quality gains by exploiting sparse features. |
Experiments | We built Arabic-English and Chinese-English MT systems with Phrasal (Cer et al., 2010), a phrase-based system based on alignment templates (Och and Ney, 2004). |
Introduction | We conduct large-scale translation quality experiments on Arabic-English and Chinese-English . |
Abstract | In our experiments, our model improved 2.9 BLEU points for J apanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models. |
Experiment | Japanese-English Chinese-English HIER 30.47 32.66 |
Introduction | Experiments confirmed the effectiveness of our method for J apanese-English and Chinese-English translation, using NTCIR-9 Patent Machine Translation Task data sets (Goto et al., 2011). |
Conclusion | Experimental results on Chinese-English translation across four test sets demonstrate significant improvements of the HD-HPB model over both Chiang’s HPB and a source-side SAMT—style refined version of HPB. |
Introduction | Figure 1: An example word alignment for a Chinese-English sentence pair with the dependency parse tree for the Chinese sentence. |
Introduction | Experiments on Chinese-English translation using four NIST MT test sets show that our HD-HPB model significantly outperforms Chiang’s HPB as well as a SAMT—style refined version of HPB. |
Experiments | The system is trained on 10 million parallel sentences that are available to the Phase 1 of the DARPA BOLT Chinese-English MT task. |
Training | For our Chinese-English experiments, we use a simple heuristic that equates as anchors, single-word chunks whose corresponding word class belongs to closed-word classes, bearing a close resemblance to (Setiawan et al., 2007). |
Two-Neighbor Orientation Model | Figure 1: An aligned Chinese-English sentence pair. |
Conclusion and Future Work | Chinese-English experimental results on four typical attributes showed that WikiCiKE significantly outperforms both the current translation based methods and the monolingual extraction methods. |
Experiments | In this section, we present our experiments to evaluate the effectiveness of WikiCiKE, where we focus on the Chinese-English case; in other words, the target language is Chinese and the source language is English. |
Introduction | Chinese-English experiments for four typical attributes demonstrate that WikiCiKE outperforms both the monolingual extraction method and current translation-based method. |
Abstract | Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. |
Conclusion and Future Work | Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. |
Experiments | In this section, we test its effectiveness in Chinese-English translation. |
Abstract | Experimental results on the NIST MT-2005 Chinese-English translation task show that our method statistically significantly outperforms the baseline systems. |
Conclusions and Future Work | The experimental results on the NIST MT-2005 Chinese-English translation task demonstrate the effectiveness of the proposed model. |
Introduction | Experiment results on the NIST MT-2005 Chinese-English translation task show that our method significantly outperforms Moses (Koehn et al., 2007), a state-of-the-art phrase-based SMT system, and other linguistically syntax-based methods, such as SCFG-based and STSG-based methods (Zhang et al., 2007). |
Conclusions | Our experimental results on IWSLT Chinese-English corpus have demonstrated consistent and significant improvement over the widely used word alignment matrix based extraction method. |
Experimental Results | We do experiments on IWSLT (Paul, 2006) 2006 Chinese-English corpus. |
Experimental Results | The training corpus consists of 40K Chinese-English parallel sentences in travel domain with to- |