Conclusion and Future Work | Experimental results show that our approach is promising for SMT systems to learn a better translation model . |
Experiments | Translation models are trained over the parallel data that is automatically word-aligned |
Experiments | This implementation makes the system perform much better and the translation model size is much smaller. |
Introduction | Current translation modeling approaches usually use context dependent information to disambiguate translation candidates. |
Introduction | Therefore, it is important to leverage topic information to learn smarter translation models and achieve better translation performance. |
Introduction | Attempts on topic-based translation modeling include topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007), topic similarity models for synchronous rules (Xiao et al., 2012), and document-level translation with topic coherence (Xiong and Zhang, 2013). |
Related Work | Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models , where the topic information of synchronous rules was directly inferred with the help of document-level information. |
Related Work | They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation. |
Related Work | They estimated phrase-topic distributions in translation model adaptation and generated better translation quality. |
Topic Similarity Model with Neural Network | Therefore, it helps to train a smarter translation model with the embedded topic information. |
Topic Similarity Model with Neural Network | Standard features: Translation model , including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature). |
Abstract | By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model . |
Abstract | In this paper, we study and experiment with novel methods that apply translation models into domain-relevant data selection. |
Introduction | The corpora are necessary priori knowledge for training effective translation model . |
Introduction | However, domain-specific machine translation has few parallel corpora for translation model training in the domain of interest. |
Introduction | To overcome the problem, we first propose the method combining translation model with language model in data selection. |
Related Work | Thus, we propose novel methods which are based on translation model and language model for data selection. |
Training Data Selection Methods | We present three data selection methods for ranking and selecting domain-relevant sentence pairs from general-domain corpus, with an eye towards improving domain-specific translation model performance. |
Training Data Selection Methods | These methods are based on language model and translation model , which are trained on small in-domain parallel data. |
Training Data Selection Methods | 3.1 Data Selection with Translation Model |
Abstract | We use translation models and language models to exploit lexical correlations and solution post character respectively. |
Introduction | We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively. |
Our Approach | The usage of translation models in QA retrieval (Xue et al., 2008; Singh, 2012) and segmentation (Deepak et al., 2012) were also motivated by the correlation assumption. |
Our Approach | We use an IBM Model 1 translation model (Brown et al., 1990) in our technique; simplistically, such a model m may be thought of as a 2-d associative array where the value m[w1] [mg] is directly related to the probability of wl occuring in the problem when ’LU2 occurs in the solution. |
Our Approach | Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions. |
Related Work | Usage of translation models for modeling the correlation between textual problems and solutions have been explored earlier starting from the answer retrieval work in (Xue et al., 2008) where new queries were conceptually expanded using the translation model to improve retrieval. |
Related Work | Translation models were also seen to be useful in segmenting incident reports into the problem and solution parts (Deepak et al., 2012); we will use an adaptation of the generative model presented therein, for our solution extraction formulation. |
Related Work | Entity-level translation models |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | To compute g(d), we use a linear combination of a skeleton translation model 98196; (d) and a full translation model gfuu (d): |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | 9(d) = gskel(d) + gfuu(d) (3) where the skeleton translation model handles the translation of the sentence skeleton, while the full translation model is the baseline model and handles the original problem of translating the whole sentence. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | The skeleton translation model focuses on the translation of the sentence skeleton, i.e., the solid (red) rectangles; while the full translation model computes the model score for all those phrase-pairs, i.e., all solid and dashed rectangles. |
Abstract | The basic idea is that we translate the key elements of the input sentence using a skeleton translation model, and then cover the remain segments using a full translation model . |
Introduction | Note that the source-language structural information has been intensively investigated in recent studies of syntactic translation models . |
Introduction | 0 We develop a skeleton-based model which divides translation into two sub-models: a skeleton translation model (i.e., translating the key elements) and a full translation model (i.e., translating the remaining source words and generating the complete translation). |
Abstract | In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation. |
Abstract | The proposed sense-based translation model enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers. |
Abstract | We test the effectiveness of the proposed sense-based translation model on a large-scale Chinese-to-English translation task. |
Introduction | These glosses, used as the sense predictions of their WSD system, are integrated into a word-based SMT system either to substitute for translation candidates of their translation model or to postedit the output of their SMT system. |
Introduction | In order to incorporate word senses into SMT, we propose a sense-based translation model that is built on maximum entropy classifiers. |
Introduction | We collect training instances from the sense-tagged training data to train the proposed sense-based translation model . |
Conclusion and Future Work | In this paper, we have presented a unified reordering framework to incorporate soft linguistic constraints (of syntactic or semantic nature) into the HPB translation model . |
Experiments | Our basic baseline system employs 19 basic features: a language model feature, 7 translation model features, word penalty, unknown word penalty, the glue rule, date, number and 6 pass-through features. |
HPB Translation Model: an Overview | Each such rule is associated with a set of translation model features {gbi}, such as phrase translation probability p (04 | 7) and its inverse p (7 | 04), the lexical translation probability plew (04 | 7) and its inverse plew (7 | 04), and a rule penalty that affects preference for longer or shorter derivations. |
Introduction | The popular distortion or lexicalized reordering models in phrase-based SMT make good local predictions by focusing on reordering on word level, while the synchronous context free grammars in hierarchical phrase-based (HPB) translation models are capable of handling nonlocal reordering on the translation phrase level. |
Introduction | The general ideas, however, are applicable to other translation models , e.g., phrase-based model, as well. |
Introduction | Section 2 provides an overview of HPB translation model . |
Related Work | Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding. |
Related Work | In the soft constraint or reordering model approach, Liu and Gildea (2010) modeled the reordering/deletion of source-side semantic roles in a tree-to-string translation model . |
Unified Linguistic Reordering Models | For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq. |
Unified Linguistic Reordering Models | For the semantic reordering models, we also add two new features into the log-linear translation model . |
Abstract | In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model . |
Experiments and Results | The baseline translation models are generated by Moses with default parameter settings. |
Input Features for DNN Feature Learning | The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model . |
Input Features for DNN Feature Learning | 2This corpus is used to train the translation model in our experiments, and we will describe it in detail in section 5.1. |
Introduction | al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model . |
Related Work | (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. |
Related Work | Kalchbrenner and Blunsom (2013) introduced recurrent continuous translation models that comprise a class for purely continuous sentence-level translation models . |
Related Work | (2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
A semantic span can include one or more eus. | 1) CSS-based translation model : following formula (1), we obtain the cohesion information by modifying the translation rules with their probabilities P(es Ift) based on word align- |
A semantic span can include one or more eus. | 3.1 CSS-based Translation Model |
A semantic span can include one or more eus. | For the existing translation models , the entire training process is conducted at the lexical or syntactic level without grammatically cohesive information. |
Abstract | Our models include a CSS-based translation model , which generates new CSS-based translation rules, and a generative transfer model, which encourages producing transitional expressions during decoding. |
Conclusion | In the future, we will extend our methods to other translation models , such as the syntax-based model, to study how to further improve the performance of SMT systems. |
Experiments | The bilingual training data for translation model and CSS-based transfer model is FBIS corpus with approximately 7.1 million Chinese words and 9.2 million English words. |
Experiments | For this work, we use an in-house decoder to build the SMT baseline; it combines the hierarchical phrase-based translation model (Chiang, 2005; Chiang, 2007) with the BTG (Wu, 1996) reordering model (Xiong et al., 2006; Zens and Ney, 2006; He et al., 2010). |
Experiments | To further evaluate the effectiveness of the proposed models, we also conducted an experiment on a larger set of bilingual training data from the LDC corpus7 for translation model and transfer model. |
Introduction | One is a new translation model that is utilized to generate new translation rules combined with the information of source functional relationships. |
Abstract | We present an adaptive translation quality estimation (QE) method to predict the human-targeted translation error rate (HTER) for a document-specific machine translation model . |
Adaptive MT Quality Estimation | Therefore it is necessary to build a QB regression model that’s robust to different document-specific translation models . |
Document-specific MT System | Building a general MT system using all the parallel data not only produces a huge translation model (unless with very aggressive pruning), the performance on the given input document is suboptimal due to the unwanted dominance of out-of-domain data. |
Document-specific MT System | Here we adopt the same strategy, building a document-specific translation model for each input document. |
Experiments | In a typical MT QE scenario, the QE model is pre-trained and applied to various MT outputs, even though the QE training data and MT outputs are generated from different translation models . |
Experiments | We train the static QE model with this training set, including the source sentences, references and MT outputs (from multiple translation models ). |
Experiments | To train the adaptive QE model for each test document, we build a translation model whose subsampling data includes source sentences from both the test document and the QE training data. |
Introduction | First, existing approaches to MT quality estimation rely on lexical and syntactical features defined over parallel sentence pairs, which includes source sentences, MT outputs and references, and translation models (Blatz et al., 2004; Ueffing and Ney, 2007; Specia et al., 2009a; Xiong et al., 2010; Soricut and Echihabi, 2010a; Bach et al., 2011). |
Static MT Quality Estimation | derived from a Maximum Entropy translation model (Ittycheriah and Roukos, 2005). |
Abstract | Second, it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering. |
Experiments | We trained our simplification and translation models on the PWKP corpus. |
Simplification Framework | Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality. |
Simplification Framework | where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively. |
Simplification Framework | Our phrase based translation model is trained using the Moses toolkit5 with its default command line options on the PWKP corpus (except the sentences from the test set) considering the complex sentence as the source and the simpler one as the target. |
Conclusion | Eventually, we would like to replace the functionality of factored translation models (Koehn and Hoang, 2007) with lattice transformation and augmentation. |
Experimental Setup | Four translation model features encode phrase translation probabilities and lexical scores in both directions. |
Introduction | Morphological complexity leads to much higher type to token ratios than English, which can create sparsity problems during translation model estimation. |
Related Work | Most techniques approach the problem by transforming the target language in some manner before training the translation model . |
Related Work | In this setting, the sparsity reduction from segmentation helps word alignment and target language modeling, but it does not result in a more expressive translation model . |
Introduction | the role of the translation model in Statistical Machine Translation (SMT). |
System | The language model is a trigram-based back-off language model with Kneser-Ney smoothing, computed using SRILM (Stolcke, 2002) and trained on the same training data as the translation model . |
System | We do so by normalising the class probability from the classifier (scoreT(H)), which is our translation model , and the language model (scorelm(H)), in such a way that the highest classifier score for the alternatives under consideration is always 1.0, and the highest language model score of the sentence is always 1.0. |
System | If desired, the search can be parametrised with variables A3 and A4, representing the weights we want to attach to the classifier-based translation model and the language model, respectively. |
Expected BLEU Training | We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1): |
Expected BLEU Training | The translation model is parameterized by A and 6 which are learned as follows (Gao et al., 2014): |
Experiments | Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings. |
Introduction | Neural network-based language and translation models have achieved impressive accuracy improvements on statistical machine translation tasks (Allauzen et al., 2011; Le et al., 2012b; Schwenk et al., 2012; Vaswani et al., 2013; Gao et al., 2014). |
Conclusion | In this work, we presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baselines on evaluation sets, and in some scenarios gains in excess of 4 BLEU points. |
Generation & Propagation | We assume that sufficient parallel resources exist to learn a basic translation model using standard techniques, and also assume the availability of larger monolingual corpora in both the source and target languages. |
Introduction | Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams, since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text. |
Related Work | (2013) and Irvine and Callison-Burch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams, and not on enriching the entire translation model . |
Introduction | They have since been extended to translation modeling , parsing, and many other NLP tasks. |
Model Variations | 4.1 Neural Network Lexical Translation Model (NNLTM) |
Model Variations | In order to assign a probability to every source word during decoding, we also train a neural network lexical translation model (NNLMT). |
Abstract | RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner. |
Introduction | DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment, language modelling, translation modelling and distortion modelling. |
Introduction | (2013) propose a joint language and translation model , based on a recurrent neural network. |