Experimental Evaluation | In case of the OPUS and VERBMOBIL corpus, we evaluate the results using BLEU (Papineni et al., 2002) and TER (Snover et al., 2006) to reference translations. |
Experimental Evaluation | For BLEU higher values are better, for TER lower values are better. |
Experimental Evaluation | Figure 3 and Figure 4 show the evolution of BLEU and TER scores for applying our method using a 2-gram and a 3- gram LM. |
Abstract | Conditioning lexical probabilities on the topic biases translations toward topic-relevant output, resulting in significant improvements of up to 1 BLEU and 3 TER on Chinese to English translation over a strong baseline. |
Experiments | On FBIS, we can see that both models achieve moderate but consistent gains over the baseline on both BLEU and TER . |
Experiments | The best model, LTM-10, achieves a gain of about 0.5 and 0.6 BLEU and 2 TER . |
Experiments | Although the performance on BLEU for both the 20 topic models LTM-20 and GTM-20 is suboptimal, the TER improvement is better. |
Introduction | Incorporating these features into our hierarchical phrase-based translation system significantly improved translation performance, by up to l BLEU and 3 TER over a strong Chinese to English baseline. |
Experiments | We employed BLEU4, METEOR (V1.0), TER (v0.7.25), and the new metric PORT. |
Experiments | In the table, TER scores are presented as 1-TER to ensure that for all metrics, higher scores mean higher quality. |
Introduction | 0 BLEU (Papineni et al., 2002), NIST (Doddington, 2002), WER, PER, TER (Snover et al., 2006), and LRscore (Birch and Osborne, 2011) do not use external linguistic |
Introduction | information; they are fast to compute (except TER ). |
Introduction | (2010) showed that BLEU tuning is more robust than tuning with other metrics (METEOR, TER , etc. |
Abstract | BLEU, TER ) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. |
Introduction | TER (Snover et al., 2006) allows arbitrary chunk movements, while permutation metrics like RIBES (Isozaki et al., 2010; Birch et al., 2010) measure deviation in word order. |
Introduction | Experiments on NIST Chinese-English and PubMed English-Japanese translation using BLEU, TER , and RIBES are presented in Section 4. |
Multi-objective Algorithms | tered are necessarily pareto-optimal. |