A Skeleton-based Approach to MT 2.1 Skeleton Identification | As is standard in SMT, we further assume that 1) the translation process can be decomposed into a derivation of phrase-pairs (for phrase-based models) or translation rules (for syntax-based models); 2) and a linear function is used to assign a model score to each derivation. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | See Figure l for an example of applying the above model to phrase-based MT. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | While we will restrict ourself to phrase-based translation in the following description and experiments, we can choose different models/features for gskel(d) and gfuu(d). |
Abstract | We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data. |
Introduction | The simplest of these is the phrase-based approach (Och et al., 1999; Koehn et al., 2003) which employs a global model to process any sub-strings of the input sentence. |
Introduction | However, these approaches suffer from the same problem as the phrase-based counterpart and use the single global model to handle different translation units, no matter they are from the skeleton of the input tree/sentence or other not-so-important substructures. |
Introduction | 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data. |
Abstract | Our best results improve a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a cross-entropy trained model by up to 0.6 BLEU in a single reference setup. |
Conclusion and Future Work | Our best result improves the output of a phrase-based decoder by up to 2.0 BLEU on French-English translation, outperforming n-best rescoring by up to 1.1 BLEU and lattice rescoring by up to 0.4 BLEU. |
Decoder Integration | Typically, phrase-based decoders maintain a set of states representing partial and complete translation hypothesis that are scored by a set of features. |
Expected BLEU Training | Formally, our phrase-based model is parameterized by M parameters A where each Am 6 A, m = l . |
Experiments | We use a phrase-based system similar to Moses (Koehn et al., 2007) based on a set of common features including maximum likelihood estimates pML (elf) and pML (f |e), lexically weighted estimates pLW(e| f) and p LW( f |e), word and phrase-penalties, a hierarchical reordering model (Galley and Manning, 2008), a linear distortion feature, and a modified Kneser—Ney language model trained on the target-side of the parallel data. |
Experiments | ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set, augmented by scores corresponding to the neural network models. |
Experiments | We use the same data both for training the phrase-based system as well as the language model but find that the resulting bias did not hurt end-to-end accuracy (Yu et al., 2013). |
Discussions | As the semantic phrase embedding can fully represent the phrase, we can go a step further in the phrase-based SMT and feed the semantic phrase embeddings to DNN in order to model the whole translation process (e.g. |
Experiments | With the semantic phrase embeddings and the vector space transformation function, we apply the BRAE to measure the semantic similarity between a source phrase and its translation candidates in the phrase-based SMT. |
Experiments | We have implemented a phrase-based translation system with a maximum entropy based reordering model using the bracketing transduction grammar (Wu, 1997; Xiong et al., 2006). |
Experiments | Typically, four translation probabilities are adopted in the phrase-based SMT, including phrase translation probability and lexical weights in both directions. |
Introduction | However, in the conventional ( phrase-based ) SMT, phrases are the basic translation units. |
Introduction | Instead, we focus on learning phrase embeddings from the view of semantic meaning, so that our phrase embedding can fully represent the phrase and best fit the phrase-based SMT. |
Abstract | This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: l) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. |
Abstract | Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. |
Conclusion and Future Work | Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. |
Introduction | The popular distortion or lexicalized reordering models in phrase-based SMT make good local predictions by focusing on reordering on word level, while the synchronous context free grammars in hierarchical phrase-based (HPB) translation models are capable of handling nonlocal reordering on the translation phrase level. |
Introduction | The general ideas, however, are applicable to other translation models, e.g., phrase-based model, as well. |
Related Work | Last, we also note that recent work on non-syntax-based reorderings in (flat) phrase-based models (Cherry, 2013; Feng et al., 2013) can also be potentially adopted to hpb models. |
Abstract | In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model. |
Experiments and Results | We choose the Moses (Koehn et al., 2007) framework to implement our phrase-based machine system. |
Input Features for DNN Feature Learning | The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model. |
Introduction | Instead of designing new features based on intuition, linguistic knowledge and domain, for the first time, Maskey and Zhou (2012) explored the possibility of inducing new features in an unsupervised fashion using deep belief net (DBN) (Hinton et al., 2006) for hierarchical phrase-based trans- |
Introduction | al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Abstract | Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolingual evidence to construct features that rescore existing translation candidates. |
Abstract | Our proposed approach significantly improves the performance of competitive phrase-based systems, leading to consistent improvements between 1 and 4 BLEU points on standard evaluation sets. |
Evaluation | The baseline is a state-of-the-art phrase-based system; we perform word alignment using a lexicalized hidden Markov model, and then the phrase table is extracted using the grow—diag—final heuristic (Koehn et al., 2003). |
Generation & Propagation | 2.5 Phrase-based SMT Expansion |
Introduction | With large amounts of data, phrase-based translation systems (Koehn et al., 2003; Chiang, 2007) achieve state-of-the-art results in many ty-pologically diverse language pairs (Bojar et al., 2013). |
Experiments | In SMT training, an in-house hierarchical phrase-based SMT decoder is implemented for our experiments. |
Introduction | For example, translation sense disambiguation approaches (Carpuat and Wu, 2005; Carpuat and Wu, 2007) are proposed for phrase-based SMT systems. |
Introduction | Meanwhile, for hierarchical phrase-based or syntax-based SMT systems, there is also much work involving rich contexts to guide rule selection (He et al., 2008; Liu et al., 2008; Marton and Resnik, 2008; Xiong et al., 2009). |
Related Work | Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models, where the topic information of synchronous rules was directly inferred with the help of document-level information. |
Experiments | These input are simplified using our simplification system namely, the DRS-SM model and the phrase-based machine translation system (Section 3.2). |
Experiments | These four sentences are directly sent to the phrase-based machine translation system to produce simplified sentences. |
Simplification Framework | The DRS associated with the final M-node D fin is then mapped to a simplified sentence s’fm which is further simplified using the phrase-based machine translation system to produce the final simplified sentence ssimple. |
Methods | In this section, we discuss how a lattice from a multi-stack phrase-based decoder such as Moses (Koehn et al., 2007) can be desegmented to enable word-level features. |
Methods | A phrase-based decoder produces its output from left to right, with each operation appending the translation of a source phrase to a growing target hypothesis. |
Methods | The search graph of a phrase-based decoder can be interpreted as a lattice, which can be interpreted as a finite state acceptor over target strings. |