Abstract | The model leverages on the strengths of both phrase-based and linguistically syntax-based method. |
Introduction | Phrase-based modeling method (Koehn et al., 2003; Och and Ney, 2004a) is a simple, but powerful mechanism to machine translation since it can model local reorderings and translations of multi-word expressions well. |
Introduction | It is designed to combine the strengths of phrase-based and syntax-based methods. |
Introduction | Experiment results on the NIST MT-2005 Chinese-English translation task show that our method significantly outperforms Moses (Koehn et al., 2007), a state-of-the-art phrase-based SMT system, and other linguistically syntax-based methods, such as SCFG-based and STSG-based methods (Zhang et al., 2007). |
Related Work | However, most of them fail to utilize non-syntactic phrases well that are proven useful in the phrase-based methods (Koehn et al., 2003) |
Related Work | Chiang (2005)’s hierarchal phrase-based model achieves significant performance improvement. |
Related Work | In the last two years, many research efforts were devoted to integrating the strengths of phrase-based and syntax-based methods. |
Tree Sequence Alignment Model | We use seven basic features that are analogous to the commonly used features in phrase-based systems (Koehn, 2004): 1) bidirectional rule mapping probabilities; 2) bidirectional lexical rule translation probabilities; 3) the target language model; 4) the number of rules used and 5) the number of target words. |
Abstract | Phrase-based decoding produces state-of-the-art translations with no regard for syntax. |
Abstract | The resulting cohesive, phrase-based decoder is shown to produce translations that are preferred over noncohesive output in both automatic and human evaluations. |
Cohesive Decoding | This section describes a modification to standard phrase-based decoding, so that the system is constrained to produce only cohesive output. |
Conclusion | We have presented a definition of syntactic cohesion that is applicable to phrase-based SMT. |
Conclusion | This suggests that cohesion could be a strong feature in estimating the confidence of phrase-based translations. |
Experiments | We have adapted the notion of syntactic cohesion so that it is applicable to phrase-based decoding. |
Experiments | The lexical reordering model provides a good comparison point as a non-syntactic, and potentially orthogonal, improvement to phrase-based movement modeling. |
Experiments | This indicates that the cohesive subset is easier to translate with a phrase-based system. |
Introduction | We attempt to use this strong, but imperfect, characterization of movement to assist a non-syntactic translation method: phrase-based SMT. |
Introduction | Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation. |
Experiments | The Czech noun cases which appear only in prepositional phrases were ignored, since they are covered by the phrase-based model. |
Introduction | Our method is based on factored phrase-based statistical machine translation models. |
Introduction | 1.1 Morphology in Phrase-based SMT |
Introduction | 0 Meanwhile, in phrase-based SMT models, words are mapped in chunks. |
Bootstrapping Phrasal ITG from Word-based ITG | This section introduces a technique that bootstraps candidate phrase pairs for phrase-based ITG from word-based ITG Viterbi alignments. |
Experiments | Finally we ran 10 iterations of phrase-based ITG over the residual charts, using EM or VB, and extracted the Viterbi alignments. |
Summary of the Pipeline | From this alignment, phrase pairs are extracted in the usual manner, and a phrase-based translation system is trained. |
Variational Bayes for ITG | The drawback of maximum likelihood is obvious for phrase-based models. |
Abstract | We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions. |
Phrase-based machine translation | The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year. |
Word alignment results | Unfortunately, as was shown by Fraser and Marcu (2007) AER can have weak correlation with translation performance as measured by BLEU score (Pa-pineni et al., 2002), when the alignments are used to train a phrase-based translation system. |
Conclusions | We integrate our method into a state-of-the-art phrase-based baseline translation system, i.e., Moses (Koehn et al., 2007), and show that the integrated system consistently improves the performance of the baseline system on various NIST machine translation test sets. |
Experimental Results | Using the toolkit Moses (Koehn et al., 2007), we built a phrase-based baseline system by following |
Experimental Results | This is analogous to the concept of “phrase” in phrase-based MT. |
Integration of inflection models with MT systems | For the phrase-based system, we generated the annotations needed by first parsing the source sentence e, aligning the source and candidate translations with the word-alignment model used in training, and projected the dependency tree to the target using the algorithm of (Quirk et al., 2005). |
Machine translation systems and data | We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems. |
Related work | In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system. |