Abstract | In this paper we show how to train statistical machine translation systems on real-life tasks using only nonparallel monolingual data from two languages. |
Conclusion | This work serves as a big step towards large-scale unsupervised training for statistical machine translation systems . |
Experimental Evaluation | Och (2002) reports results of 48.2 BLEU for a single-word based translation system and 56.1 BLEU using the alignment template approach, both trained on parallel data. |
Related Work | Unsupervised training of statistical translations systems without parallel data and related problems have been addressed before. |
Experiments | (2004) and Cherry and Quirk (2008) both use the l-best output of a machine translation system . |
Experiments | Cherry and Quirk (2008) report an accuracy of 71.9% on a similar experiment with German a source language, though the translation system and training data were different so the numbers are not comparable. |
Experiments | (2004) and Cherry and Quirk (2008) in evaluating our language models on their ability to distinguish the l-best output of a machine translation system from a reference translation in a pairwise fashion. |
Introduction | N -gram language models are a central component of all speech recognition and machine translation systems , and a great deal of research centers around refining models (Chen and Goodman, 1998), efficient storage (Pauls and Klein, 2011; Heafield, 2011), and integration into decoders (Koehn, 2004; Chiang, 2005). |
Abstract | We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. |
Ensemble Decoding | The current implementation is able to combine hierarchical phrase-based systems (Chiang, 2005) as well as phrase-based translation systems (Koehn et al., 2003). |
Ensemble Decoding | However, the method can be easily extended to support combining a number of heterogeneous translation systems e.g. |
Introduction | We have modified Kriya (Sankaran et al., 2012), an in-house implementation of hierarchical phrase-based translation system (Chiang, 2005), to implement ensemble decoding using multiple translation models. |
Abstract | Two decades after their invention, the IBM word-based translation models, widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems . |
Conclusion | We hope that our method, due to its simplicity, generality, and effectiveness, will find wide application for training better statistical translation systems . |
Experiments | We then tested the effect of word alignments on translation quality using the hierarchical phrase-based translation system Hiero (Chiang, 2007). |