Models 2.1 Baseline Models | Our second baseline is a factored translation model (Koehn and Hoang, 2007) (called Factored), which used as factors the word, “stem”1 and suffix. |
Models 2.1 Baseline Models | performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup), using the hand-built Omorfi morphological analyzer (Pirinen and Lis-tenmaa, 2007), which provided slightly higher BLEU scores than the word-based baseline. |
Models 2.1 Baseline Models | For segmented translation models , it cannot be taken for granted that greater linguistic accuracy in segmentation yields improved translation (Chang et al., 2008). |
Translation and Morphology | Our main contributions are: 1) the introduction of the notion of segmented translation where we explicitly allow phrase pairs that can end with a dangling morpheme, which can connect with other morphemes as part of the translation process, and 2) the use of a fully segmented translation model in combination with a postprocessing morpheme prediction system, using unsupervised morphology induction. |
Translation and Morphology | Morphology can express both content and function categories, and our experiments show that it is important to use morphology both within the translation model (for morphology with content) and outside it (for morphology contributing to fluency). |
Abstract | We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from nonparallel text. |
Introduction | From these corpora, we estimate translation model parameters: word-to-word translation tables, fertilities, distortion parameters, phrase tables, syntactic transformations, etc. |
Introduction | In this paper, we address the problem of learning a fall translation model from nonparallel data, and we use the |
Introduction | How can we learn a translation model from nonparallel data? |
Machine Translation as a Decipherment Task | Probabilistic decipherment: Unlike parallel training, here we have to estimate the translation model P9( f |e) parameters using only monolingual data. |
Machine Translation as a Decipherment Task | We then estimate parameters of the translation model P9( f |e) during training. |
Machine Translation as a Decipherment Task | EM Decipherment: We propose a new translation model for MT decipherment which can be efficiently trained using the EM algorithm. |
Introduction | Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model , which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al., |
Related Work | The translation models in these techniques define phrases as contiguous word sequences (with gaps allowed in the case of hierarchical phrases) which may or may not correspond to any linguistic constituent. |
Related Work | Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality. |
Related Work | Significant research has examined the extent to which syntax can be usefully incorporated into statistical tree-based translation models: string-to-tree (Yamada and Knight, 2001; Gildea, 2003; Imamura et al., 2004; Galley et al., 2004; Graehl and Knight, 2004; Melamed, 2004; Galley et al., 2006; Huang et al., 2006; Shen et al., 2008), tree-to-string (Liu et al., 2006; Liu et al., 2007; Mi et al., 2008; Mi and Huang, 2008; Huang and Mi, 2010), tree-to-tree (Abeille et al., 1990; Shieber and Schabes, 1990; Poutsma, 1998; Eisner, 2003; Shieber, 2004; Cowan et al., 2006; Nesson et al., 2006; Zhang et al., 2007; DeNeefe et al., 2007; DeNeefe and Knight, 2009; Liu et al., 2009; Chiang, 2010), and treelet (Ding and Palmer, 2005; Quirk et al., 2005) techniques use syntactic information to inform the translation model . |
Introduction | A (formal) translation model is at the core of every machine translation system. |
Introduction | (1990) discuss automatically trainable translation models in their seminal paper. |
Introduction | Contrary, in the field of syntax-based machine translation, the translation models have full access to the syntax of the sentences and can base their decision on it. |
Preservation of regularity | Preservation of regularity is an important property for a number of translation model manipulations. |
Experiments | The induced joint translation model can be used to recover arg maxe p(e|f), as it is equal to arg maxe p(e, f We employ the induced probabilistic HR-SCFG G as the backbone of a log-linear, feature based translation model , with the derivation probability p(D) under the grammar estimate being |
Introduction | Nevertheless, the successful employment of SCFGs for phrase-based SMT brought translation models assuming latent syntactic structure to the spotlight. |
Introduction | Section 2 discusses the weak independence assumptions of SCFGs and introduces a joint translation model which addresses these issues and separates hierarchical translation structure from phrase-pair emission. |
Joint Translation Model | The rest of the (sometimes thousands of) rule-specific features usually added to SCFG translation models do not directly help either, leaving reordering decisions disconnected from the rest of the derivation. |
Related Work | Most of the aforementioned work does concentrate on learning hierarchical, linguistically motivated translation models . |
Experiments | Accordingly, we use Chiang’s hierarchical phrase based translation model (Chiang, 2007) as a base line, and the syntax-augmented MT model (Zollmann and Venugopal, 2006) as a ‘target line’, a model that would not be applicable for language pairs without linguistic resources. |
Hard rule labeling from word classes | Our approach instead uses distinct grammar rules and labels to discriminate phrase size, with the advantage of enabling all translation models to estimate distinct weights for distinct size classes and avoiding the need of additional models in the log-linear framework; however, the increase in the number of labels and thus grammar rules decreases the reliability of estimated models for rare events due to increased data sparseness. |
Introduction | Labels on these nonterminal symbols are often used to enforce syntactic constraints in the generation of bilingual sentences and imply conditional independence assumptions in the translation model . |
Related work | (2009) present a nonparametric PSCFG translation model that directly induces a grammar from parallel sentences Without the use of or constraints from a word-alignment model, and |
Conclusion | We have described a Lagrangian relaxation algorithm for exact decoding of syntactic translation models , and shown that it is significantly more efficient than other exact algorithms for decoding tree- |
Conclusion | Our experiments have focused on tree-to-string models, but the method should also apply to Hiero-style syntactic translation models (Chiang, 2007). |
Experiments | We use an identical model, and identical development and test data, to that used by Huang and Mi.9 The translation model is trained on 1.5M sentence pairs of Chinese-English data; a trigram language model is used. |