A Class-based Model of Agreement | Segmentation is typically applied as a bitext preprocessing step, and there is a rich literature on the effect of different segmentation schemata on translation quality (Koehn and Knight, 2003; Habash and Sadat, 2006; El Kholy and Habash, 2012). |
Conclusion and Outlook | Our class-based agreement model improves translation quality by promoting local agreement, but with a minimal increase in decoding time and no additional storage requirements for the phrase table. |
Discussion of Translation Results | 2 shows translation quality results on newswire, while Tbl. |
Experiments | We first evaluate the Arabic segmenter and tagger components independently, then provide English-Arabic translation quality results. |
Experiments | 5.2 Translation Quality |
Experiments | We evaluated translation quality with BLEU-4 (Pa-pineni et al., 2002) and computed statistical significance with the approximate randomization method of Riezler and Maxwell (2005).9 |
Introduction | However, using lexical coverage experiments, we show that there is ample room for translation quality improvements through better selection of forms that already exist in the translation model. |
Abstract | We explain how to implement this extension efliciently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Conclusion | The method is implemented as a modification to the open-source toolkit GIZA++, and we have shown that it significantly improves translation quality across four different language pairs. |
Experiments | As we will see below, we still obtained strong improvements in translation quality when hand-aligned data was unavailable. |
Experiments | We then tested the effect of word alignments on translation quality using the hierarchical phrase-based translation system Hiero (Chiang, 2007). |
Experiments | We ran some contrastive experiments to investigate the impact of hyperparameter tuning on translation quality . |
Introduction | In this paper, we propose a simple extension to the IBM/HMM models that is unsupervised like the IBM models, is as scalable as GIZA++ because it is implemented on top of GIZA++, and provides significant improvements in both alignment and translation quality . |
Introduction | Experiments on Czech-, Arabic-, Chinese- and Urdu-English translation (Section 3) demonstrate consistent significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Abstract | Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2. |
Abstract | We compare our approach with Moses and observe the same performance, but a substantially better tradeoff between translation quality and speed. |
Conclusions | We compare our decoder to Moses, reaching a similar highest BLEU score, but clearly outperforming it in terms of scalability with respect to the tradeoff ratio between translation quality and speed. |
Experimental Evaluation | It yields nearly the same top performance with an even better tradeoff between translation quality and speed. |
Introduction | phrase translation candidates has a positive effect on both translation quality and speed. |
Search Algorithm Extensions | A better pre-selection can be expected to improve translation quality . |
Abstract | BLEU, TER) focus on different aspects of translation quality ; our multi-objective approach leverages these diverse aspects to improve overall quality. |
Introduction | These methods are effective because they tune the system to maximize an automatic evaluation metric such as BLEU, which serve as surrogate objective for translation quality . |
Introduction | Ideally, we want to tune towards an automatic metric that has perfect correlation with human judgments of translation quality . |
Introduction | Different evaluation metrics focus on different aspects of translation quality . |
Abstract | These rules are employed to enrich the SMT inputs for translation quality improvement. |
Conclusion | The manual investigation on oral translation results indicate that the paraphrase rules capture four kinds of MT-favored transformation to ensure translation quality improvement. |
Discussion | investigate What kinds of transformation finally lead to the translation quality improvement. |
Forward-Translation vs. Back-Translation | Finally the translation quality of Back-Translation is evaluated by using the original source texts as references. |
Introduction | The translation quality of the SMT system is highly related to the coverage of translation models. |
Experiments | Is our topic similarity model able to improve translation quality in terms of BLEU? |
Experiments | This verifies that topic similarity model can improve the translation quality significantly. |
Introduction | To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality . |
Introduction | We further show that both the source-side and target-side topic distributions improve translation quality and their improvements are complementary to each other. |
Experiments | Table 4 shows translation quality for BLEU- and PORT-tuned systems, as assessed by automatic metrics. |
Introduction | Many of the metrics correlate better with human judgments of translation quality than BLEU, as shown in recent WMT Evaluation Task reports (Callison-Burch et |
Introduction | Results given below show that PORT correlates better with human judgments of translation quality than BLEU does, and sometimes outperforms METEOR in this respect, based on data from WMT (2008—2010). |
Abstract | Therefore, it is desirable to train all these parameters to directly maximize an objective that directly links to translation quality . |
Abstract | The training objective is an expected BLEU score, which is closely linked to translation quality . |
Abstract | The expected BLEU score is closely linked to translation quality and the regularization is essential when many parameters are trained at scale. |