Abstract | Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-of-the-art baseline. |
Conclusion and Future Work | We conduct experiments on IWSLT and NIST data, and our method can improve the performance significantly. |
Experiments and Results | We test our method with two data settings: one is IWSLT data set, the other is NIST data set. |
Experiments and Results | For the NIST data set, the bilingual training data we used is NIST 2008 training set excluding the Hong Kong Law and Hong Kong Hansard. |
Experiments and Results | The baseline results on NIST data are shown in Table 2. |
Introduction | We conduct experiments with IWSLT and NIST data, and experimental results show that, our method |
Experiments | To demonstrate the effect of the {O-norm on the IBM models, we performed experiments on four translation tasks: Arabic-English, Chinese-English, and Urdu-English from the NIST Open MT Evaluation, and the Czech-English translation from the Workshop on Machine Translation (WMT) shared task. |
Experiments | 0 Chinese-English: selected data from the constrained task of the NIST 2009 Open MT Evaluation.3 |
Experiments | o Arabic-English: all available data for the constrained track of NIST 2009, excluding United Nations proceedings (LDC2004E13), ISI Automatically Extracted Parallel Text (LDC2007E08), and Ummah newswire text (LDC2004T18), for a total of 5.4+4.3 million words. |
Experiments | (2) The NIST task is Chinese-to-English translation with OpenMT08 training data and MT06 as devset. |
Experiments | Train Devset #Feat Metrics PubMed 0.2M 2k 14 BLEU, RIBES NIST 7M 1.6k 8 BLEU,NTER |
Experiments | Our MT models are trained with standard phrase-based Moses software (Koehn and others, 2007), with IBM M4 alignments, 4gram SRILM, leXical ordering for PubMed and distance ordering for the NIST system. |
Introduction | Experiments on NIST Chinese-English and PubMed English-Japanese translation using BLEU, TER, and RIBES are presented in Section 4. |
Experiments | The second setting uses the non-UN and non-HK Hansards portions of the NIST training corpora with LTM only. |
Experiments | En Zh FBIS 269K 10.3M 7.9M NIST 1.6M 44.4M 40.4M |
Experiments | 2010) as our decoder, and tuned the parameters of the system to optimize BLEU (Papineni et al., 2002) on the NIST MT06 tuning corpus using the Margin Infused Relaxed Algorithm (MIRA) (Crammer et al., 2006; Eidelman, 2012). |
Experiments | Table 1: Inter-judge Kappa for the NIST 2008 English—Chinese task |
Experiments | 4.2 NIST 2008 English-Chinese MT Task |
Experiments | The NIST 2008 English-Chinese MT task consists of 127 documents with 1,830 segments, each with four reference translations and eleven automatic MT system translations. |
Introduction | The work compared various MT evaluation metrics (BLEU, NIST , METEOR, GTM, 1 — TER) with different segmentation schemes, and found that treating every single character as a token (character-level MT evaluation) gives the best correlation with human judgments. |
Abstract | Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity. |
Experiments and Results | We use 2003 NIST Open Machine Translation Evaluation data (NIST 2003) as development data (containing 919 sentences) for MERT and test the performance on NIST 2008 data set (containing 1357 sentences). |
Experiments and Results | NIST Chinese-to-English evaluation data offers four English human translations for every Chinese sentence. |
Experiments and Results | Table 1: iBLEU Score Results( NIST 2008) |
Introduction | We test our method on NIST 2008 testing data. |
Experiments | The dev set comprised mainly data from the NIST 2005 test set, and also some balanced-genre web-text from NIST . |
Experiments | Evaluation was performed on NIST 2006 and 2008. |
Experiments | Table 10: Ordering scores (p, I and v) for test sets NIST |
Introduction | 0 BLEU (Papineni et al., 2002), NIST (Doddington, 2002), WER, PER, TER (Snover et al., 2006), and LRscore (Birch and Osborne, 2011) do not use external linguistic |
Abstract | Experiments on Chinese—English translation on four NIST MT test sets show that the HD—HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. |
Experiments | We train our model on a dataset with ~1.5M sentence pairs from the LDC dataset.2 We use the 2002 NIST MT evaluation test data (878 sentence pairs) as the development data, and the 2003, 2004, 2005, 2006-news NIST MT evaluation test data (919, 1788, 1082, and 616 sentence pairs, respectively) as the test data. |
Experiments | For evaluation, the NIST BLEU script (version 12) with the default settings is used to calculate the BLEU scores. |
Introduction | Experiments on Chinese-English translation using four NIST MT test sets show that our HD-HPB model significantly outperforms Chiang’s HPB as well as a SAMT—style refined version of HPB. |
Abstract | We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments. |
Experiments | We present our experiments on the NIST Chinese-English translation tasks. |
Experiments | We used the NIST evaluation set of 2005 (MT05) as our development set, and sets of MT06/MT08 as test sets. |
Experiments | Case-insensitive NIST BLEU (Papineni et al., 2002) was used to mea- |
Experiments | develop NIST 2002 878 10 NIST 2005 1,082 4 NIST 2004 1,788 5 test NIST 2006 1,664 4 NIST 2008 1,357 4 |
Experiments | The system was tested using the Chinese-English MT evaluation sets of NIST 2004, NIST 2006 and NIST 2008. |
Experiments | For development, we used the Chinese-English MT evaluation sets of NIST 2002 and NIST 2005. |
Experiments | For Chinese-to-English translation, we use the parallel data from NIST Open Machine Translation Evaluation tasks. |
Experiments | The NIST 2003 and 2005 test data are respectively taken as the development and test set. |
Introduction | (2008) and achieved state-of-the-art results as reported in the NIST 2008 Open MT Evaluation workshop and the NTCIR-9 Chinese-to-English patent translation task (Goto et al., 2011; Ma and Matsoukas, 2011). |