Conclusion & Future Work | We will also add capability of supporting syntax-based ensemble decoding and experiment how a phrase-based system can benefit from syntax information present in a syntax-aware MT system. |
Ensemble Decoding | The current implementation is able to combine hierarchical phrase-based systems (Chiang, 2005) as well as phrase-based translation systems (Koehn et al., 2003). |
Ensemble Decoding | phrase-based, hierarchical phrase-based , and/or syntax-based systems. |
Experiments & Results 4.1 Experimental Setup | For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model (LM) and word count. |
Experiments & Results 4.1 Experimental Setup | For ensemble decoding, we modified an in-house implementation of hierarchical phrase-based system, Kriya (Sankaran et al., 2012) which uses the same features mentioned in (Chiang, 2005): forward and backward relative-frequency and lexical TM probabilities; LM; word, phrase and glue-rules penalty. |
Experiments & Results 4.1 Experimental Setup | The first group are the baseline results on the phrase-based system discussed in Section 2 and the second group are those of our hierarchical MT system. |
Introduction | We have modified Kriya (Sankaran et al., 2012), an in-house implementation of hierarchical phrase-based translation system (Chiang, 2005), to implement ensemble decoding using multiple translation models. |
Related Work 5.1 Domain Adaptation | Among these approaches are sentence-based, phrase-based and word-based output combination methods. |
Related Work 5.1 Domain Adaptation | Firstly, unlike the multi-table support of Moses which only supports phrase-based translation table combination, our approach supports ensembles of both hierarchical and phrase-based systems. |
Abstract | We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase-based SMT system. |
Experiments | We use a BTG phrase-based system with a Max-Ent based leXicalized reordering model (Wu, 1997; Xiong et al., 2006) as our baseline system for |
Integration into SMT system | There are two ways to integrate the ranking reordering model into a phrase-based SMT system: the pre-reorder method, and the decoding time constraint method. |
Integration into SMT system | Reordered sentences can go through the normal pipeline of a phrase-based decoder. |
Integration into SMT system | The above three penalties are added as additional features into the log-linear model of the phrase-based system. |
Introduction | In phrase-based models (Och, 2002; Koehn et al., 2003), phrase is introduced to serve as the fundamental translation element and deal with local reordering, while a distance based distortion model is used to coarsely depict the exponentially decayed word movement probabilities in language translation. |
Introduction | Long-distance word reordering between language pairs with substantial word order difference, such as Japanese with Subject-Object—Verb (SOV) structure and English with Subject-Verb-Object (SVO) structure, is generally viewed beyond the scope of the phrase-based systems discussed above, because of either distortion limits or lack of discriminative features for modeling. |
Introduction | This is usually done in a preprocessing step, and then followed by a standard phrase-based SMT system that takes the reordered source sentence as input to finish the translation. |
A Class-based Model of Agreement | We chose a bigram model due to the aggressive recombination strategy in our phrase-based decoder. |
Abstract | The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. |
Discussion of Translation Results | Phrase Table Coverage In a standard phrase-based system, effective translation into a highly inflected target language requires that the phrase table contain the inflected word forms necessary to construct an output with correct agreement. |
Discussion of Translation Results | This large gap between the unigram recall of the actual translation output (top) and the lexical coverage of the phrase-based model (bottom) indicates that translation performance can be improved dramatically by altering the translation model through features such as ours, without expanding the search space of the decoder. |
Experiments | Experimental Setup Our decoder is based on the phrase-based approach to translation (Och and Ney, 2004) and contains various feature functions including phrase relative frequency, word-level alignment statistics, and lexicalized reordering models (Tillmann, 2004; Och et al., 2004). |
Inference during Translation Decoding | 3.1 Phrase-based Translation Decoding |
Inference during Translation Decoding | We consider the standard phrase-based approach to MT (Och and Ney, 2004). |
Introduction | Agreement relations that cross statistical phrase boundaries are not explicitly modeled in most phrase-based MT systems (Avramidis and Koehn, 2008). |
Introduction | The model can be implemented using the feature APIs of popular phrase-based decoders such as Moses (Koehn et al., 2007) and Phrasal (Cer et al., 2010). |
Related Work | Subotin (2011) recently extended factored translation models to hierarchical phrase-based translation and developed a discriminative model for predicting target-side morphology in English-Czech. |
Abstract | Our work is based on a phrase-based SMT system. |
Abstract | Phrase-based Translation System |
Abstract | The translation process of phrase-based SMT can be briefly described in three steps: segment source sentence into a sequence of phrases, translate each |
Abstract | The two models are integrated into a state-of-the-art phrase-based machine translation system and evaluated on Chinese-to-English translation tasks with large-scale training data. |
Conclusions and Future Work | The two models have been integrated into a phrase-based SMT system and evaluated on Chinese-to-English translation tasks using large-scale training data. |
Integrating the Two Models into SMT | In this section, we elaborate how to integrate the two models into phrase-based SMT. |
Integrating the Two Models into SMT | In particular, we integrate the models into a phrase-based system which uses bracketing transduction grammars (BTG) (Wu, 1997) for phrasal translation (Xiong et al., 2006). |
Integrating the Two Models into SMT | It is straightforward to integrate the predicate translation model into phrase-based SMT (Koehn et al., |
Introduction | We integrate these two discriminative models into a state-of-the-art phrase-based system. |
Abstract | We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. |
Conclusion and Future Work | We have presented a topic similarity model which incorporates the rule-topic distributions on both the source and target side into traditional hierarchical phrase-based system. |
Experiments | We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations. |
Introduction | Consequently, we propose a topic similarity model for hierarchical phrase-based translation (Chiang, 2007), where each synchronous rule is associated with a topic distribution. |
Introduction | We augment the hierarchical phrase-based system by integrating the proposed topic similarity model as a new feature (Section 3.1). |
Introduction | Experiments on Chinese-English translation tasks (Section 6) show that, our method outperforms the baseline hierarchial phrase-based system by +0.9 BLEU points. |
Abstract | This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head—Driven HPB (HD—HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. |
Conclusion | We present a head-driven hierarchical phrase-based (HD-HPB) translation model, which adopts head information (derived through unlabeled dependency analysis) in the definition of non-terminals to better differentiate among translation rules. |
Head-Driven HPB Translation Model | For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). |
Introduction | Chiang’s hierarchical phrase-based (HPB) translation model utilizes synchronous context free grammar (SCFG) for translation derivation (Chiang, 2005; Chiang, 2007) and has been widely adopted in statistical machine translation (SMT). |
Introduction | [ Phrase-based Translation |
Abstract | In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions. |
Conclusions | This work introduces two extensions to the well-known beam search algorithm for phrase-based machine translation. |
Introduction | Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model (LM) querying (Heafield, 2011). |