Introduction | Section 3 introduces two probabilistic models for integrating translations and transliterations into a translation model which are based on conditional and joint probability distributions. |
Our Approach | Both of our models combine a character-based transliteration model with a word-based translation model . |
Our Approach | Language Model for Unknown Words: Our model generates transliterations that can be known or unknown to the language model and the translation model . |
Our Approach | We refer to the words known to the language model and to the translation model as LM-known and TM-known words respectively and to words that are unknown as LM-unknown and TM-unknown respectively. |
Previous Work | Moreover, they are working with a large bitext so they can rely on their translation model and only need to transliterate NEs and OOVs. |
Previous Work | Our translation model is based on data which is both sparse and noisy. |
Introduction | In this work, we attempt to learn statistical translation models from only monolingual data in the source and target language. |
Introduction | This work is a big step towards large-scale and large-vocabulary unsupervised training of statistical translation models . |
Introduction | In this work, we will develop, describe, and evaluate methods for large vocabulary unsupervised learning of machine translation models suitable for real-world tasks. |
Related Work | Their best performing approach uses an EM-Algorithm to train a generative word based translation model . |
Translation Model | In this section, we describe the statistical training criterion and the translation model that is trained using monolingual data. |
Translation Model | As training criterion for the translation model’s parameters 6, Ravi and Knight (2011) suggest |
Translation Model | This becomes increasingly difficult with more complex translation models . |
Abstract | Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. |
Baselines | Log-linear translation model (TM) mixtures are of the form: |
Ensemble Decoding | Given a number of translation models which are already trained and tuned, the ensemble decoder uses hypotheses constructed from all of the models in order to translate a sentence. |
Introduction | Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems: the language model and the translation model . |
Introduction | translation model adaptation, because various measures such as perplexity of adapted language models can be easily computed on data in the target domain. |
Introduction | It is also easier to obtain monolingual data in the target domain, compared to bilingual data which is required for translation model adaptation. |
Abstract | This paper proposes a new discriminative training method in constructing phrase and lexicon translation models . |
Abstract | parameters in the phrase and lexicon translation models are estimated by relative frequency or maximizing joint likelihood, which may not correspond closely to the translation measure, e.g., bilingual evaluation understudy (BLEU) (Papineni et al., 2002). |
Abstract | However, the number of parameters in common phrase and lexicon translation models is much larger. |
Abstract | In this paper, we propose two discriminative, feature-based models to exploit predicate-argument structures for statistical machine translation: 1) a predicate translation model and 2) an argument reordering model. |
Abstract | The predicate translation model explores lexical and semantic contexts surrounding a verbal predicate to select desirable translations for the predicate. |
Introduction | This suggests that conventional leXical and phrasal translation models adopted in those SMT systems are not sufficient to correctly translate predicates in source sentences. |
Introduction | Thus we propose a discriminative, feature-based predicate translation model that captures not only leXical information (i.e., surrounding words) but also high-level semantic contexts to correctly translate predicates. |
Introduction | In Section 3 and 4, we will elaborate the proposed predicate translation model and argument reordering model respectively, including details about modeling, features and training procedure. |
Predicate Translation Model | In this section, we present the features and the training process of the predicate translation model . |
Predicate Translation Model | Following the context-dependent word models in (Berger et al., 1996), we propose a discriminative predicate translation model . |
Predicate Translation Model | Given a source sentence which contains N verbal predicates , our predicate translation model Mt can be denoted as |
Related Work | Our predicate translation model is also related to previous discriminative leXicon translation models (Berger et al., 1996; Venkatapathy and Bangalore, 2007; Mauser et al., 2009). |
Related Work | This will tremendously reduce the amount of training data required, which usually is a problem in discriminative leXicon translation models (Mauser et al., 2009). |
Related Work | Furthermore, the proposed translation model also differs from previous leXicon translation models in that we use both leXical and semantic features. |
Abstract | We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from nonparallel text. |
Introduction | From these corpora, we estimate translation model parameters: word-to-word translation tables, fertilities, distortion parameters, phrase tables, syntactic transformations, etc. |
Introduction | In this paper, we address the problem of learning a fall translation model from nonparallel data, and we use the |
Introduction | How can we learn a translation model from nonparallel data? |
Machine Translation as a Decipherment Task | Probabilistic decipherment: Unlike parallel training, here we have to estimate the translation model P9( f |e) parameters using only monolingual data. |
Machine Translation as a Decipherment Task | We then estimate parameters of the translation model P9( f |e) during training. |
Machine Translation as a Decipherment Task | EM Decipherment: We propose a new translation model for MT decipherment which can be efficiently trained using the EM algorithm. |
Abstract | additive neural networks, for SMT to go beyond the log-linear translation model . |
Abstract | Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and J apanese-to-English translation tasks. |
Introduction | On the one hand, features are required to be linear with respect to the objective of the translation model (Nguyen et al., 2007), but it is not guaranteed that the potential features be linear with the model. |
Introduction | In the search procedure, frequent computation of the model score is needed for the search heuristic function, which will be challenged by the decoding efficiency for the neural network based translation model . |
Introduction | The biggest contribution of this paper is that it goes beyond the log-linear model and proposes a nonlinear translation model instead of re-ranking model (Duh and Kirchhoff, 2008; Sokolov et al., 2012). |
Abstract | Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words) as side-information used for estimating translation models . |
Decipherment Model for Machine Translation | Contrary to standard machine translation training scenarios, here we have to estimate the translation model P9( f |e) parameters using only monolingual data. |
Decipherment Model for Machine Translation | We then estimate the parameters of the translation model P9 (f le) during training. |
Decipherment Model for Machine Translation | Translation Model : Machine translation is a much more complex task than solving other decipherment tasks such as word substitution ciphers (Ravi and Knight, 2011b; Dou and Knight, 2012). |
Introduction | The parallel corpora are used to estimate translation model parameters involving word-to-word translation tables, fertilities, distortion, phrase translations, syntactic transformations, etc. |
Introduction | Learning translation models from monolingual corpora could help address the challenges faced by modem-day MT systems, especially for low resource language pairs. |
Introduction | Recently, this topic has been receiving increasing attention from researchers and new methods have been proposed to train statistical machine translation models using only monolingual data in the source and target language. |
Models 2.1 Baseline Models | Our second baseline is a factored translation model (Koehn and Hoang, 2007) (called Factored), which used as factors the word, “stem”1 and suffix. |
Models 2.1 Baseline Models | performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup), using the hand-built Omorfi morphological analyzer (Pirinen and Lis-tenmaa, 2007), which provided slightly higher BLEU scores than the word-based baseline. |
Models 2.1 Baseline Models | For segmented translation models , it cannot be taken for granted that greater linguistic accuracy in segmentation yields improved translation (Chang et al., 2008). |
Translation and Morphology | Our main contributions are: 1) the introduction of the notion of segmented translation where we explicitly allow phrase pairs that can end with a dangling morpheme, which can connect with other morphemes as part of the translation process, and 2) the use of a fully segmented translation model in combination with a postprocessing morpheme prediction system, using unsupervised morphology induction. |
Translation and Morphology | Morphology can express both content and function categories, and our experiments show that it is important to use morphology both within the translation model (for morphology with content) and outside it (for morphology contributing to fluency). |
Abstract | We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. |
Abstract | Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1 BLEU over unadapted systems and single-domain adaptation. |
Introduction | We introduce a translation model architecture that delays the computation of features to the decoding phase. |
Related Work | (Ortiz-Martinez et al., 2010) delay the computation of translation model features for the purpose of interactive machine translation with online training. |
Related Work | (Sennrich, 2012b) perform instance weighting of translation models , based on the sufficient statistics. |
Related Work | (Razmara et al., 2012) describe an ensemble decoding framework which combines several translation models in the decoding step. |
Translation Model Architecture | This section covers the architecture of the multi-domain translation model framework. |
Translation Model Architecture | Our translation model is embedded in a log-linear model as is common for SMT, and treated as a single translation model in this log-linear combination. |
Translation Model Architecture | The architecture has two goals: move the calculation of translation model features to the decoding phase, and allow for multiple knowledge sources (e. g. bitexts or user-provided data) to contribute to their calculation. |
Abstract | State-of-the-art approaches address these issues by implicitly expanding the queried questions with additional words or phrases using monolingual translation models . |
Experiments | Row 3 and row 6 are monolingual translation models to address the word mismatch problem and obtain the state-of-the-art performance in previous work. |
Experiments | Row 3 is the word-based translation model (Jeon et al., 2005), and row 4 is the word-based translation language model, which linearly combines the word-based translation model and language model into a unified framework (Xue et al., 2008). |
Experiments | Row 5 is the phrase-based translation model , which translates a sequence of words as whole (Zhou et al., 2011). |
Introduction | Researchers have proposed the use of word-based translation models (Berger et al., 2000; Jeon et al., 2005; Xue et al., 2008; Lee et al., 2008; Bernhard and Gurevych, 2009) to solve the word mismatch problem. |
Introduction | As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM) for question retrieval. |
Introduction | (2011) proposed the phrase-based translation models for question and answer retrieval. |
Conclusion and Future Work | Experimental results show that our approach is promising for SMT systems to learn a better translation model . |
Experiments | Translation models are trained over the parallel data that is automatically word-aligned |
Experiments | This implementation makes the system perform much better and the translation model size is much smaller. |
Introduction | Current translation modeling approaches usually use context dependent information to disambiguate translation candidates. |
Introduction | Therefore, it is important to leverage topic information to learn smarter translation models and achieve better translation performance. |
Introduction | Attempts on topic-based translation modeling include topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007), topic similarity models for synchronous rules (Xiao et al., 2012), and document-level translation with topic coherence (Xiong and Zhang, 2013). |
Related Work | Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models , where the topic information of synchronous rules was directly inferred with the help of document-level information. |
Related Work | They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation. |
Related Work | They estimated phrase-topic distributions in translation model adaptation and generated better translation quality. |
Topic Similarity Model with Neural Network | Therefore, it helps to train a smarter translation model with the embedded topic information. |
Topic Similarity Model with Neural Network | Standard features: Translation model , including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature). |
Abstract | This paper proposes a forest-based tree sequence to string translation model for syntax-based statistical machine translation, which automatically learns tree sequence to string translation rules from word-aligned source-side-parsed bilingual texts. |
Abstract | The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models . |
Decoding | 4) Decode the translation forest using our translation model and a dynamic search algorithm. |
Forest-based tree sequence to string model | 3.3 Forest-based tree-sequence to string translation model |
Forest-based tree sequence to string model | Given a source forest F and target translation T S as well as word alignment A, our translation model is formulated as: |
Introduction | to String Translation Model |
Introduction | Section 2 describes related work while section 3 defines our translation model . |
Related work | Motivated by the fact that non-syntactic phrases make nontrivial contribution to phrase-based SMT, the tree sequence-based translation model is proposed (Liu et al., 2007; Zhang et al., 2008a) that uses tree sequence as the basic translation unit, rather than using single subtree as in the STSG. |
Related work | (2007) propose the tree sequence concept and design a tree sequence to string translation model . |
Related work | (2008a) propose a tree sequence-based tree to tree translation model and Zhang et al. |
Abstract | They can be obtained by training statistical translation models on parallel monolingual corpora, such as question-answer pairs, where answers act as the “source” language and questions as the “target” language. |
Abstract | We compare monolingual translation models built from lexical semantic resources with two other kinds of datasets: manually-tagged question reformulations and question-answer pairs. |
Introduction | Berger and Lafferty (1999) have formulated a further solution to the lexical gap problem consisting in integrating monolingual statistical translation models in the retrieval process. |
Introduction | Monolingual translation models encode statistical word associations which are trained on parallel monolingual corpora. |
Introduction | While collection-specific translation models effectively encode statistical word associations for the target document collection, it also introduces a bias in the evaluation and makes it difficult to assess the quality of the translation model per se, independently from a specific task and document collection. |
Abstract | In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation. |
Abstract | The proposed sense-based translation model enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers. |
Abstract | We test the effectiveness of the proposed sense-based translation model on a large-scale Chinese-to-English translation task. |
Introduction | These glosses, used as the sense predictions of their WSD system, are integrated into a word-based SMT system either to substitute for translation candidates of their translation model or to postedit the output of their SMT system. |
Introduction | In order to incorporate word senses into SMT, we propose a sense-based translation model that is built on maximum entropy classifiers. |
Introduction | We collect training instances from the sense-tagged training data to train the proposed sense-based translation model . |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | To compute g(d), we use a linear combination of a skeleton translation model 98196; (d) and a full translation model gfuu (d): |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | 9(d) = gskel(d) + gfuu(d) (3) where the skeleton translation model handles the translation of the sentence skeleton, while the full translation model is the baseline model and handles the original problem of translating the whole sentence. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | The skeleton translation model focuses on the translation of the sentence skeleton, i.e., the solid (red) rectangles; while the full translation model computes the model score for all those phrase-pairs, i.e., all solid and dashed rectangles. |
Abstract | The basic idea is that we translate the key elements of the input sentence using a skeleton translation model, and then cover the remain segments using a full translation model . |
Introduction | Note that the source-language structural information has been intensively investigated in recent studies of syntactic translation models . |
Introduction | 0 We develop a skeleton-based model which divides translation into two sub-models: a skeleton translation model (i.e., translating the key elements) and a full translation model (i.e., translating the remaining source words and generating the complete translation). |
Abstract | This paper presents a translation model that is based on tree sequence alignment, where a tree sequence refers to a single sequence of sub-trees that covers a phrase. |
Introduction | 3d Tree-to-Tree Translation Model |
Introduction | In this paper, we propose a tree-to-tree translation model that is based on tree sequence alignment. |
Related Work | Ding and Palmer (2005) propose a syntax-based translation model based on a probabilistic synchronous dependency insertion grammar. |
Related Work | (2005) propose a dependency treelet-based translation model . |
Related Work | (2007b) present a STSG-based tree-to-tree translation model . |
Tree Sequence Alignment Model | 3.2 Tree Sequence Translation Model |
Tree Sequence Alignment Model | sequence-to-tree sequence translation model is formulated as: |
Abstract | We use translation models and language models to exploit lexical correlations and solution post character respectively. |
Introduction | We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively. |
Our Approach | The usage of translation models in QA retrieval (Xue et al., 2008; Singh, 2012) and segmentation (Deepak et al., 2012) were also motivated by the correlation assumption. |
Our Approach | We use an IBM Model 1 translation model (Brown et al., 1990) in our technique; simplistically, such a model m may be thought of as a 2-d associative array where the value m[w1] [mg] is directly related to the probability of wl occuring in the problem when ’LU2 occurs in the solution. |
Our Approach | Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions. |
Related Work | Usage of translation models for modeling the correlation between textual problems and solutions have been explored earlier starting from the answer retrieval work in (Xue et al., 2008) where new queries were conceptually expanded using the translation model to improve retrieval. |
Related Work | Translation models were also seen to be useful in segmenting incident reports into the problem and solution parts (Deepak et al., 2012); we will use an adaptation of the generative model presented therein, for our solution extraction formulation. |
Related Work | Entity-level translation models |
Abstract | By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model . |
Abstract | In this paper, we study and experiment with novel methods that apply translation models into domain-relevant data selection. |
Introduction | The corpora are necessary priori knowledge for training effective translation model . |
Introduction | However, domain-specific machine translation has few parallel corpora for translation model training in the domain of interest. |
Introduction | To overcome the problem, we first propose the method combining translation model with language model in data selection. |
Related Work | Thus, we propose novel methods which are based on translation model and language model for data selection. |
Training Data Selection Methods | We present three data selection methods for ranking and selecting domain-relevant sentence pairs from general-domain corpus, with an eye towards improving domain-specific translation model performance. |
Training Data Selection Methods | These methods are based on language model and translation model , which are trained on small in-domain parallel data. |
Training Data Selection Methods | 3.1 Data Selection with Translation Model |
Experiments | For the synthetic method, we used the ES translation model to translate the English part of the CE corpus to Spanish to construct a synthetic corpus. |
Experiments | And we also used the BTEC CEl corpus to build a EC translation model to translate the English part of ES corpus into Chinese. |
Experiments | Then we combined these two synthetic corpora to build a Chinese-Spanish translation model . |
Introduction | It multiples corresponding translation probabilities and lexical weights in source-pivot and pivot-target translation models to induce a new source-target phrase table. |
Introduction | For example, we can obtain a source-pivot corpus by translating the pivot sentence in the source-pivot corpus into the target language with pivot-target translation models . |
Pivot Methods for Phrase-based SMT | Following the method described in Wu and Wang (2007), we train the source-pivot and pivot-target translation models using the source-pivot and pivot-target corpora, respectively. |
Pivot Methods for Phrase-based SMT | Based on these two models, we induce a source-target translation model , in which two important elements need to be induced: phrase translation probability and lexical weight. |
Pivot Methods for Phrase-based SMT | Then we build a source-target translation model using this corpus. |
Using RBMT Systems for Pivot Translation | The source-target pairs extracted from this synthetic multilingual corpus can be used to build a source-target translation model . |
Abstract | Current SMT systems usually decode with single translation models and cannot benefit from the strengths of other models in decoding phase. |
Abstract | We instead propose joint decoding, a method that combines multiple translation models in one decoder. |
Conclusion | We have presented a framework for including multiple translation models in one decoder. |
Introduction | In this paper, we propose a framework for combining multiple translation models directly in de- |
Joint Decoding | Second, translation models differ in decoding algorithms. |
Joint Decoding | Despite the diversity of translation models , they all have to produce partial translations for substrings of input sentences. |
Joint Decoding | Therefore, we represent the search space of a translation model as a structure called translation hypergraph. |
Abstract | In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model . |
Experiments and Results | The baseline translation models are generated by Moses with default parameter settings. |
Input Features for DNN Feature Learning | The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model . |
Input Features for DNN Feature Learning | 2This corpus is used to train the translation model in our experiments, and we will describe it in detail in section 5.1. |
Introduction | al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model . |
Related Work | (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. |
Related Work | Kalchbrenner and Blunsom (2013) introduced recurrent continuous translation models that comprise a class for purely continuous sentence-level translation models . |
Related Work | (2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Conclusion and Future Work | In this paper, we have presented a unified reordering framework to incorporate soft linguistic constraints (of syntactic or semantic nature) into the HPB translation model . |
Experiments | Our basic baseline system employs 19 basic features: a language model feature, 7 translation model features, word penalty, unknown word penalty, the glue rule, date, number and 6 pass-through features. |
HPB Translation Model: an Overview | Each such rule is associated with a set of translation model features {gbi}, such as phrase translation probability p (04 | 7) and its inverse p (7 | 04), the lexical translation probability plew (04 | 7) and its inverse plew (7 | 04), and a rule penalty that affects preference for longer or shorter derivations. |
Introduction | The popular distortion or lexicalized reordering models in phrase-based SMT make good local predictions by focusing on reordering on word level, while the synchronous context free grammars in hierarchical phrase-based (HPB) translation models are capable of handling nonlocal reordering on the translation phrase level. |
Introduction | The general ideas, however, are applicable to other translation models , e.g., phrase-based model, as well. |
Introduction | Section 2 provides an overview of HPB translation model . |
Related Work | Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding. |
Related Work | In the soft constraint or reordering model approach, Liu and Gildea (2010) modeled the reordering/deletion of source-side semantic roles in a tree-to-string translation model . |
Unified Linguistic Reordering Models | For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq. |
Unified Linguistic Reordering Models | For the semantic reordering models, we also add two new features into the log-linear translation model . |
A semantic span can include one or more eus. | 1) CSS-based translation model : following formula (1), we obtain the cohesion information by modifying the translation rules with their probabilities P(es Ift) based on word align- |
A semantic span can include one or more eus. | 3.1 CSS-based Translation Model |
A semantic span can include one or more eus. | For the existing translation models , the entire training process is conducted at the lexical or syntactic level without grammatically cohesive information. |
Abstract | Our models include a CSS-based translation model , which generates new CSS-based translation rules, and a generative transfer model, which encourages producing transitional expressions during decoding. |
Conclusion | In the future, we will extend our methods to other translation models , such as the syntax-based model, to study how to further improve the performance of SMT systems. |
Experiments | The bilingual training data for translation model and CSS-based transfer model is FBIS corpus with approximately 7.1 million Chinese words and 9.2 million English words. |
Experiments | For this work, we use an in-house decoder to build the SMT baseline; it combines the hierarchical phrase-based translation model (Chiang, 2005; Chiang, 2007) with the BTG (Wu, 1996) reordering model (Xiong et al., 2006; Zens and Ney, 2006; He et al., 2010). |
Experiments | To further evaluate the effectiveness of the proposed models, we also conducted an experiment on a larger set of bilingual training data from the LDC corpus7 for translation model and transfer model. |
Introduction | One is a new translation model that is utilized to generate new translation rules combined with the information of source functional relationships. |
Abstract | We present an adaptive translation quality estimation (QE) method to predict the human-targeted translation error rate (HTER) for a document-specific machine translation model . |
Adaptive MT Quality Estimation | Therefore it is necessary to build a QB regression model that’s robust to different document-specific translation models . |
Document-specific MT System | Building a general MT system using all the parallel data not only produces a huge translation model (unless with very aggressive pruning), the performance on the given input document is suboptimal due to the unwanted dominance of out-of-domain data. |
Document-specific MT System | Here we adopt the same strategy, building a document-specific translation model for each input document. |
Experiments | In a typical MT QE scenario, the QE model is pre-trained and applied to various MT outputs, even though the QE training data and MT outputs are generated from different translation models . |
Experiments | We train the static QE model with this training set, including the source sentences, references and MT outputs (from multiple translation models ). |
Experiments | To train the adaptive QE model for each test document, we build a translation model whose subsampling data includes source sentences from both the test document and the QE training data. |
Introduction | First, existing approaches to MT quality estimation rely on lexical and syntactical features defined over parallel sentence pairs, which includes source sentences, MT outputs and references, and translation models (Blatz et al., 2004; Ueffing and Ney, 2007; Specia et al., 2009a; Xiong et al., 2010; Soricut and Echihabi, 2010a; Bach et al., 2011). |
Static MT Quality Estimation | derived from a Maximum Entropy translation model (Ittycheriah and Roukos, 2005). |
Introduction | Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model , which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al., |
Related Work | The translation models in these techniques define phrases as contiguous word sequences (with gaps allowed in the case of hierarchical phrases) which may or may not correspond to any linguistic constituent. |
Related Work | Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality. |
Related Work | Significant research has examined the extent to which syntax can be usefully incorporated into statistical tree-based translation models: string-to-tree (Yamada and Knight, 2001; Gildea, 2003; Imamura et al., 2004; Galley et al., 2004; Graehl and Knight, 2004; Melamed, 2004; Galley et al., 2006; Huang et al., 2006; Shen et al., 2008), tree-to-string (Liu et al., 2006; Liu et al., 2007; Mi et al., 2008; Mi and Huang, 2008; Huang and Mi, 2010), tree-to-tree (Abeille et al., 1990; Shieber and Schabes, 1990; Poutsma, 1998; Eisner, 2003; Shieber, 2004; Cowan et al., 2006; Nesson et al., 2006; Zhang et al., 2007; DeNeefe et al., 2007; DeNeefe and Knight, 2009; Liu et al., 2009; Chiang, 2010), and treelet (Ding and Palmer, 2005; Quirk et al., 2005) techniques use syntactic information to inform the translation model . |
Abstract | The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of sub-trees. |
Abstract | This paper goes further to present a translation model based on noncontiguous tree sequence alignment, where a noncontiguous tree sequence is a sequence of sub-trees and gaps. |
Experiments | In the experiments, we train the translation model on FBIS corpus (7.2M (Chinese) + 9.2M (English) words) and train a 4-gram language model on the Xinhua portion of the English Gigaword corpus (181M words) using the SRILM Toolkits (Stolcke, |
Introduction | Bod (2007) also finds that discontinues phrasal rules make significant improvement in linguistically motivated STSG-based translation model . |
Introduction | 2 We illustrate the rule extraction with an example from the tree-to-tree translation model based on tree sequence alignment (Zhang et al, 2008a) without losing of generality to most syntactic tree based models. |
Introduction | To address this issue, we propose a syntactic translation model based on noncontiguous tree sequence alignment. |
NonContiguous Tree sequence Align-ment-based Model | In this section, we give a formal definition of SncTSSG and accordingly we propose the alignment based translation model . |
NonContiguous Tree sequence Align-ment-based Model | 2.2 SncTSSG based Translation Model |
Eliciting Addressee’s Emotion | Following (Ritter et al., 2011), we apply the statistical machine translation model for generating a response to a given utterance. |
Eliciting Addressee’s Emotion | We use GIZA++8 and SRILM9 for learning translation model and 5-gram language model, re- |
Eliciting Addressee’s Emotion | We use the emotion-tagged dialogue corpus to learn eight translation models and language models, each of which is specialized in generating the response that elicits one of the eight emotions (Plutchik, 1980). |
Experiments | Table 6: The number of utterance pairs used for training classifiers in emotion prediction and learning the translation models and language models in response generation. |
Experiments | We use the utterance pairs summarized in Table 6 to learn the translation models and language models for eliciting each emotional category. |
Experiments | However, for learning the general translation models , we currently use 4 millions of utterance pairs sampled from the 640 millions of pairs due to the computational limitation. |
Approach | supervised IR models, the answer ranking is implemented using discriminative learning, and finally, some of the ranking features are produced by question-to-answer translation models , which use class-conditional learning. |
Approach | the similarity between questions and answers (FGl), features that encode question-to-answer transformations using a translation model (FG2), features that measure keyword density and frequency (FG3), and features that measure the correlation between question-answer pairs and other collections (FG4). |
Approach | One way to address this problem is to learn question-to-answer transformations using a translation model (Berger et al., 2000; Echihabi and Marcu, 2003; Soricut and Brill, 2006; Riezler et al., 2007). |
Experiments | Our ranking model was tuned strictly on the development set (i.e., feature selection and parameters of the translation models ). |
Experiments | to improve lexical matching and translation models . |
Experiments | This indicates that, even though translation models are the most useful, it is worth exploring approaches that combine several strategies for answer ranking. |
Related Work | In the QA literature, answer ranking for non-factoid questions has typically been performed by learning question-to-answer transformations, either using translation models (Berger et al., 2000; Soricut and Brill, 2006) or by exploiting the redundancy of the Web (Agichtein et al., 2001). |
Related Work | On the other hand, our approach allows the learning of full transformations from question structures to answer structures using translation models applied to different text representations. |
Experiments | In order to evaluate the influence of segmentation results upon the statistical ON translation system, we compare the results of two translation models . |
Experiments | For constructing a statistical ON translation model , we use GIZA++I to align the Chinese NEs and the English NEs in the training set. |
Experiments | Q2: the Chinese ON and the results of the statistical translation model . |
Related Work | The first type of methods translates ONs by building a statistical translation model . |
Related Work | The statistical translation model can give an output for any input. |
The Chunking-based Segmentation for Chinese ONs | The performance of the statistical ON translation model is dependent on the precision of the Chinese ON segmentation to some extent. |
Abstract | Similarity scores are used as additional features of the translation model to improve translation performance. |
Experiments | The sense similarity scores are used as feature functions in the translation model . |
Experiments | In particular, all the allowed bilingual corpora except the UN corpus and Hong Kong Hansard corpus have been used for estimating the translation model . |
Experiments | The second one is the small data condition where only the FBIS3 corpus is used to train the translation model . |
Hierarchical phrase-based MT system | The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG). |
Introduction | the translation probabilities in a translation model , for units from parallel corpora are mainly based on the co-occurrence counts of the two units. |
A Class-based Model of Agreement | Translation Model 6 Target sequence of I words f Source sequence of J words a Sequence of K phrase alignments for (e, f) H Permutation of the alignments for target word order 6 h Sequence of M feature functions A Sequence of learned weights for the M features H A priority queue of hypotheses |
Discussion of Translation Results | This large gap between the unigram recall of the actual translation output (top) and the lexical coverage of the phrase-based model (bottom) indicates that translation performance can be improved dramatically by altering the translation model through features such as ours, without expanding the search space of the decoder. |
Experiments | We trained the translation model on 502 million words of parallel text collected from a variety of sources, including the Web. |
Inference during Translation Decoding | 3.3 Translation Model Features |
Introduction | However, using lexical coverage experiments, we show that there is ample room for translation quality improvements through better selection of forms that already exist in the translation model . |
Related Work | Factored Translation Models Factored translation models (Koehn and Hoang, 2007) facilitate a more data-oriented approach to agreement modeling. |
Related Work | Subotin (2011) recently extended factored translation models to hierarchical phrase-based translation and developed a discriminative model for predicting target-side morphology in English-Czech. |
Abstract | Two decades after their invention, the IBM word-based translation models , widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems. |
Abstract | In this paper, we propose a simple extension to the IBM models: an 60 prior to encourage sparsity in the word-to-word translation model . |
Conclusion | We have extended the IBM models and HMM model by the addition of an (0 prior to the word-to-word translation model , which compacts the word-to-word translation table, reducing overfitting, and, in particular, the “garbage collection” effect. |
Experiments | Table 4 shows B scores for translation models learned from these alignments. |
Introduction | Although state-of—the-art translation models use rules that operate on units bigger than words (like phrases or tree fragments), they nearly always use word alignments to drive extraction of those translation rules. |
Introduction | It extends the IBM/HMM models by incorporating an (0 prior, inspired by the principle of minimum description length (Barron et al., 1998), to encourage sparsity in the word-to-word translation model (Section 2.2). |
Conclusion and Future Work | Finally, we hope to apply our method to other translation models , especially syntax-based models. |
Decoding | In the topic-specific lexicon translation model , given a source document, it first calculates the topic-specific translation probability by normalizing the entire lexicon translation table, and then adapts the lexical weights of rules correspondingly. |
Experiments | The adapted lexicon translation model is added as a new feature under the discriminative framework. |
Introduction | To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality. |
Introduction | Topic-specific lexicon translation models focus on word-level translations. |
Related Work | combine a specific domain translation model with a general domain translation model depending on various text distances. |
Conclusion | Opposed to other factored translation model approaches that require target language factors, that are not easily obtainable for many languages, our approach only requires English syntax trees, which are acquired with widely available automatic parsers. |
Factored Model | The factored statistical machine translation model uses a log-linear approach, in order to combine the several components, including the language model, the reordering model, the translation models and the generation models. |
Introduction | Our method is based on factored phrase-based statistical machine translation models . |
Introduction | Traditional statistical machine translation models deal with this problems in two ways: |
Introduction | Then, contrary to the methods that added only output features or altered the generation procedure, we used this information in order to augment only the source side of a factored translation model , assuming that we do not have resources allowing factors or specialized generation in the target language (a common problem, when translating from English into under-resourced languages). |
Methods for enriching input | Considering such annotation, a factored translation model is trained to map the word-case pair to the correct inflection of the target noun. |
Conclusion | We present a head-driven hierarchical phrase-based (HD-HPB) translation model , which adopts head information (derived through unlabeled dependency analysis) in the definition of non-terminals to better differentiate among translation rules. |
Head-Driven HPB Translation Model | Like Chiang (2005) and Chiang (2007), our HD-HPB translation model adopts a synchronous context free grammar, a rewriting system which generates source and target side string pairs simultaneously using a context-free grammar. |
Head-Driven HPB Translation Model | For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). |
Head-Driven HPB Translation Model | Merging two neighboring non-terminals into a single nonterminal, NRRs enable the translation model to explore a wider search space. |
Introduction | Chiang’s hierarchical phrase-based (HPB) translation model utilizes synchronous context free grammar (SCFG) for translation derivation (Chiang, 2005; Chiang, 2007) and has been widely adopted in statistical machine translation (SMT). |
Introduction | However, the two approaches are not mutually exclusive, as we could also include a set of syntax-driven features into our translation model . |
Alignment | , N is used for both the initialization of the translation model p(f|é) and the phrase model training. |
Conclusion | We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model . |
Experimental Evaluation | The scaling factors of the translation models have been optimized for BLEU on the DEV data. |
Experimental Evaluation | We will focus on the proposed leaving-one-out technique and show that it helps in finding good phrasal alignments on the training data that lead to improved translation models . |
Introduction | Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table |
Related Work | This is different from word-based translation models , where a typical assumption is that each target word corresponds to only one source word. |
Introduction | The translation quality of the SMT system is highly related to the coverage of translation models . |
Introduction | Naturally, a solution to the coverage problem is to bridge the gaps between the input sentences and the translation models , either from the input side, which targets on rewriting the input sentences to the MT-favored expressions, or from |
Introduction | the side of translation models, which tries to enrich the translation models to cover more expressions. |
Abstract | We present a translation model which models derivations as a latent variable, in both training and decoding, and is fully discriminative and globally optimised. |
Discriminative Synchronous Transduction | Our log-linear translation model defines a conditional probability distribution over the target translations of a given source sentence. |
Evaluation | 6We also experimented with using max-translation decoding for standard MER trained translation models , finding that it had a small negative impact on BLEU score. |
Evaluation | Firstly we show the relative scores of our model against Hiero without using reverse translation or lexical features.7 This allows us to directly study the differences between the two translation models without the added complication of the other features. |
Evaluation | As expected, the language model makes a significant difference to BLEU, however we believe that this effect is orthogonal to the choice of base translation model , thus we would expect a similar gain when integrating a language model into the discriminative system. |
Integration of inflection models with MT systems | In this method, stemming can impact word alignment in addition to the translation models . |
MT performance results | From the results of Method 2 we can see that reducing sparsity at translation modeling is advantageous. |
MT performance results | From these results, we also see that about half of the gain from using stemming in the base MT system came from improving word alignment, and half came from using translation models operating at the less sparse stem level. |
Machine translation systems and data | The treelet translation model is estimated using a parallel corpus. |
Related work | Though our motivation is similar to that of Koehn and Hoang (2007), we chose to build an independent component for inflection prediction in isolation rather than folding morphological information into the main translation model . |
Introduction | the role of the translation model in Statistical Machine Translation (SMT). |
System | The language model is a trigram-based back-off language model with Kneser-Ney smoothing, computed using SRILM (Stolcke, 2002) and trained on the same training data as the translation model . |
System | We do so by normalising the class probability from the classifier (scoreT(H)), which is our translation model , and the language model (scorelm(H)), in such a way that the highest classifier score for the alternatives under consideration is always 1.0, and the highest language model score of the sentence is always 1.0. |
System | If desired, the search can be parametrised with variables A3 and A4, representing the weights we want to attach to the classifier-based translation model and the language model, respectively. |
Conclusion | Eventually, we would like to replace the functionality of factored translation models (Koehn and Hoang, 2007) with lattice transformation and augmentation. |
Experimental Setup | Four translation model features encode phrase translation probabilities and lexical scores in both directions. |
Introduction | Morphological complexity leads to much higher type to token ratios than English, which can create sparsity problems during translation model estimation. |
Related Work | Most techniques approach the problem by transforming the target language in some manner before training the translation model . |
Related Work | In this setting, the sparsity reduction from segmentation helps word alignment and target language modeling, but it does not result in a more expressive translation model . |
Abstract | Second, it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering. |
Experiments | We trained our simplification and translation models on the PWKP corpus. |
Simplification Framework | Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality. |
Simplification Framework | where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively. |
Simplification Framework | Our phrase based translation model is trained using the Moses toolkit5 with its default command line options on the PWKP corpus (except the sentences from the test set) considering the complex sentence as the source and the simpler one as the target. |
Abstract | We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model , which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. |
Conclusion and Future Work | In this paper, we presented a novel forest-based constituency-to-dependency translation model , which combines the advantages of both tree-to-string and string-to-tree systems, runs fast and guarantees grammaticality of the output. |
Introduction | Linguistically syntax-based statistical machine translation models have made promising progress in recent years. |
Model | Figure 1 shows a word-aligned source constituency forest FC and target dependency tree De, our constituency to dependency translation model can be formalized as: |
Related Work | (2009), we apply forest into a new constituency tree to dependency tree translation model rather than constituency tree-to-tree model. |
Introduction | A (formal) translation model is at the core of every machine translation system. |
Introduction | (1990) discuss automatically trainable translation models in their seminal paper. |
Introduction | Contrary, in the field of syntax-based machine translation, the translation models have full access to the syntax of the sentences and can base their decision on it. |
Preservation of regularity | Preservation of regularity is an important property for a number of translation model manipulations. |
Abstract | Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model. |
Introduction | Phrase-based and tree-based translation model are the two main streams in state-of-the-art machine translation. |
Introduction | The tree-based translation model , by using a synchronous context-free grammar formalism, can capture longer reordering between source and target language. |
Introduction | Many features are shared between phrase-based and tree-based systems including language model, word count, and translation model features. |
Experiments | The induced joint translation model can be used to recover arg maxe p(e|f), as it is equal to arg maxe p(e, f We employ the induced probabilistic HR-SCFG G as the backbone of a log-linear, feature based translation model , with the derivation probability p(D) under the grammar estimate being |
Introduction | Nevertheless, the successful employment of SCFGs for phrase-based SMT brought translation models assuming latent syntactic structure to the spotlight. |
Introduction | Section 2 discusses the weak independence assumptions of SCFGs and introduces a joint translation model which addresses these issues and separates hierarchical translation structure from phrase-pair emission. |
Joint Translation Model | The rest of the (sometimes thousands of) rule-specific features usually added to SCFG translation models do not directly help either, leaving reordering decisions disconnected from the rest of the derivation. |
Related Work | Most of the aforementioned work does concentrate on learning hierarchical, linguistically motivated translation models . |
Abstract | We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorporate them into our translation model as features. |
Discussion and Conclusion | We can construct a topic model once on the training data, and use it infer topics on any test set to adapt the translation model . |
Introduction | This problem has led to a substantial amount of recent work in trying to bias, or adapt, the translation model (TM) toward particular domains of interest (Axelrod et al., 2011; Foster et al., 2010; Snover et al., 2008).1 The intuition behind TM adaptation is to increase the likelihood of selecting relevant phrases for translation. |
Introduction | We induce unsupervised domains from large corpora, and we incorporate soft, probabilistic domain membership into a translation model . |
Introduction | We accomplish this by introducing topic dependent lexical probabilities directly as features in the translation model , and interpolating them log-linearly with our other features, thus allowing us to discriminatively optimize their weights on an arbitrary objective function. |
Abstract | We present a new translation model integrating the shallow local multi bottom-up tree transducer. |
Experiments | Both translation models were trained with approximately 1.5 million bilingual sentences after length-ratio filtering. |
Introduction | In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model . |
Introduction | The theoretical foundations of [MBOT and their integration into our translation model are presented in Sections 2 and 3. |
Translation Model | Given a source language sentence 6, our translation model aims to find the best corresponding target language translation §;7 i.e., |
Experiment | The translation model was trained using sentences of 40 words or less from the training data. |
Experiment | The common SMT feature set consists of: four translation model features, phrase penalty, word penalty, and a language model feature. |
Experiment | Our distortion model was trained as follows: We used 0.2 million sentence pairs and their word alignments from the data used to build the translation model as the training data for our distortion models. |
Experiments | The larger question threads data is employed for feature learning, such as translation model , and topic model training. |
Proposed Features | Translation Model |
Proposed Features | A translation model is a mathematical model in which the language translation is modeled in a statistical way. |
Proposed Features | We train a translation model (Brown et al., 1990; Och and Ney, 2003) using the Community QA data, with the question as the target language, and the corresponding best answer as the source language. |
Comparative Study | (Marifio et al., 2006) implement a translation model using n-grams. |
Experiments | Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model (LM), a phrase translation model and a word-based lexicon model. |
Experiments | Table 1: translation model and LM training data statistics |
Experiments | Table 1 contains the data statistics used for translation model and LM. |
Introduction | (Marifio et al., 2006) present a translation model that constitutes a language model of a sort of bilanguage composed of bilingual units. |
Abstract | Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora. |
Experiments | In the end-to-end MT pipeline we use a standard set of features: relative-frequency and lexical translation model probabilities in both directions; distance-based distortion model; language model and word count. |
Introduction | Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data, from |
Introduction | This paper develops a phrase-based translation model which aims to address the above shortcomings of the phrase-based translation pipeline. |
Computing Feature Expectations | The weight of h is the incremental score contributed to all translations containing the rule application, including translation model features on 7“ and language model features that depend on both 7“ and the English contexts of the child nodes. |
Experimental Results | Hiero is a hierarchical system that expresses its translation model as a synchronous context-free grammar (Chiang, 2007). |
Experimental Results | SBMT is a string-to-tree translation system with rich target-side syntactic information encoded in the translation model . |
Experimental Results | Figure 3: N - grams with high expected count are more likely to appear in the reference translation that 71- grams in the translation model’s Viterbi translation, 6*. |
Experiments | The translation model (TM) was smoothed in both directions with KN smoothing (Chen et al., 2011). |
Experiments | In (Foster and Kuhn, 2007), two kinds of linear mixture were described: linear mixture of language models (LMs), and linear mixture of translation models (TMs). |
Introduction | The translation models of a statistical machine translation (SMT) system are trained on parallel data. |
Introduction | The 2012 JHU workshop on Domain Adaptation for MT 1 proposed phrase sense disambiguation (PSD) for translation model adaptation. |
Conclusion | In this work, we presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baselines on evaluation sets, and in some scenarios gains in excess of 4 BLEU points. |
Generation & Propagation | We assume that sufficient parallel resources exist to learn a basic translation model using standard techniques, and also assume the availability of larger monolingual corpora in both the source and target languages. |
Introduction | Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams, since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text. |
Related Work | (2013) and Irvine and Callison-Burch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams, and not on enriching the entire translation model . |
Expected BLEU Training | We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1): |
Expected BLEU Training | The translation model is parameterized by A and 6 which are learned as follows (Gao et al., 2014): |
Experiments | Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings. |
Introduction | Neural network-based language and translation models have achieved impressive accuracy improvements on statistical machine translation tasks (Allauzen et al., 2011; Le et al., 2012b; Schwenk et al., 2012; Vaswani et al., 2013; Gao et al., 2014). |
Introduction | In section 3, we briefly describe the translation model with phrasal ITGs and Pitman-Yor process. |
Related Work | The translation model and language model are primary components in SMT. |
Related Work | In the case of the previous work on translation modeling , mixed methods have been investigated for domain adaptation in SMT by adding domain information as additional labels to the original phrase table (Foster and Kuhn, 2007). |
Related Work | Monolingual topic information is taken as a new feature for a domain adaptive translation model and tuned on the development set (Su et al., 2012). |
Abstract | Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. |
Introduction | Novel translation models , such as phrase-based models (Koehn et a., 2007), hierarchical phrase-based models (Chiang, 2007) and linguistically syntax-based models (Liu et a., 2006; Huang et al., 2006; Galley, 2006; Zhang et a1, 2008; Chiang, 2010; Zhang et al., 2011; Zhai et al., 2011, 2012) have been proposed and achieved higher and higher translation performance. |
Introduction | However, all of these state-of-the-art translation models rely on the parallel corpora to induce translation rules and estimate the corresponding parameters. |
Introduction | Finally, they used the learned translation model directly to translate unseen data (Ravi and Knight, 2011; Nuhn et al., 2012) or incorporated the learned bilingual lexicon as a new in-domain translation resource into the phrase-based model which is trained with out-of-domain data to improve the domain adaptation performance in machine translation (Dou and Knight, 2012). |
Introduction | We first replace inflected forms by their stems or lemmas: building a translation system on a stemmed representation of the target side leads to a simpler translation task, and the morphological information contained in the source and target language parts of the translation model is more balanced. |
Translation pipeline | We use this representation to create the stemmed representation for training the translation model . |
Translation pipeline | In addition to the translation model , the target-side language model, as well as the reference data for parameter tuning use this representation. |
Translation pipeline | 3.2 Building a stemmed translation model |
Experiments | Accordingly, we use Chiang’s hierarchical phrase based translation model (Chiang, 2007) as a base line, and the syntax-augmented MT model (Zollmann and Venugopal, 2006) as a ‘target line’, a model that would not be applicable for language pairs without linguistic resources. |
Hard rule labeling from word classes | Our approach instead uses distinct grammar rules and labels to discriminate phrase size, with the advantage of enabling all translation models to estimate distinct weights for distinct size classes and avoiding the need of additional models in the log-linear framework; however, the increase in the number of labels and thus grammar rules decreases the reliability of estimated models for rare events due to increased data sparseness. |
Introduction | Labels on these nonterminal symbols are often used to enforce syntactic constraints in the generation of bilingual sentences and imply conditional independence assumptions in the translation model . |
Related work | (2009) present a nonparametric PSCFG translation model that directly induces a grammar from parallel sentences Without the use of or constraints from a word-alignment model, and |
Abstract | We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations. |
Discussion and Future Directions | We have presented a novel, incremental topic-based translation model adaptation approach that obeys the causality constraint imposed by spoken conversations. |
Incremental Topic-Based Adaptation | Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality. |
Incremental Topic-Based Adaptation | We add this feature to the log-linear translation model with its own weight, which is tuned with MERT. |
Abstract | We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. |
Experiments | The word-to-Word translation probabilities are from the translation model of IBM Model 4 trained on a 160-million-word English-Chinese parallel corpus using GIZA++. |
Introduction | This complexity arises from the interaction of the tree-based translation model with an n-gram language model. |
Improving Statistical Bilingual Word Alignment | IBM Model 1 only employs the word translation model to calculate the probabilities of alignments. |
Improving Statistical Bilingual Word Alignment | In IBM Model 2, both the word translation model and position distribution model are used. |
Improving Statistical Bilingual Word Alignment | IBM Model 3, 4 and 5 consider the fertility model in addition to the word translation model and position distribution model. |
Introduction | The quality of the parallel data and the word alignment have significant impacts on the learned translation models and ultimately the quality of translation output. |
Sentence Alignment Confidence Measure | The source-to-target lexical translation model p(t|s) and target-to-source model p(s|t) can be obtained through IBM Model-l or HMM training. |
Sentence Alignment Confidence Measure | For the efficient computation of the denominator, we use the lexical translation model . |
Experimental Results | Our translation model was trained on about 1M parallel sentence pairs (about 28M words in each language), which are sub-sampled from corpora distributed by LDC for the NIST MT evaluation using a sampling method based on the n-gram matches between training and test sets in the foreign side. |
Experimental Results | We use GIZA++ (Och and Ney, 2000), a suffix-array (Lopez, 2007), SRILM (Stol-cke, 2002), and risk-based deterministic annealing (Smith and Eisner, 2006)17 to obtain word alignments, translation models , language models, and the optimal weights for combining these models, respectively. |
Variational vs. Min-Risk Decoding | However, suppose the hypergraph were very large (thanks to a large or smoothed translation model and weak pruning). |
Abstract | RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner. |
Introduction | DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment, language modelling, translation modelling and distortion modelling. |
Introduction | (2013) propose a joint language and translation model , based on a recurrent neural network. |
Analysis | We want to further study the happenings after we integrate the constraint feature (our SDB model and Marton and Resnik’s XP+) into the log-linear translation model . |
Experiments | All translation models were trained on the FBIS corpus. |
The Syntax-Driven Bracketing Model 3.1 The Model | new feature into the log-linear translation model : PSDB (b|T, This feature is computed by the SDB model described in equation (3) or equation (4), which estimates a probability that a source span is to be translated as a unit within particular syntactic contexts. |
Introduction | They have since been extended to translation modeling , parsing, and many other NLP tasks. |
Model Variations | 4.1 Neural Network Lexical Translation Model (NNLTM) |
Model Variations | In order to assign a probability to every source word during decoding, we also train a neural network lexical translation model (NNLMT). |
Conclusion | These tree-to-tree rules are applicable for forest-to-tree translation models (Liu et al., 2009a). |
Experiments | 4.1 Translation models |
Experiments | In our translation models , we have made use of three kinds of translation rule sets which are trained separately. |
Conclusion | We have described a Lagrangian relaxation algorithm for exact decoding of syntactic translation models , and shown that it is significantly more efficient than other exact algorithms for decoding tree- |
Conclusion | Our experiments have focused on tree-to-string models, but the method should also apply to Hiero-style syntactic translation models (Chiang, 2007). |
Experiments | We use an identical model, and identical development and test data, to that used by Huang and Mi.9 The translation model is trained on 1.5M sentence pairs of Chinese-English data; a trigram language model is used. |
Introduction | The decipherment approach for MT has recently gained popularity for training and adapting translation models using only monolingual data. |
Introduction | The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language. |
Related Work | (Ravi and Knight, 2011), (Nuhn et al., 2012), and (Dou and Knight, 2012) treat natural language translation as a deciphering problem including phenomena like reordering, insertion, and deletion and are able to train translation models using only monolingual data. |
Name-aware MT | Some of these names carry special meanings that may influence translations of the neighboring words, and thus replacing them with non-terminals can lead to information loss and weaken the translation model . |
Name-aware MT | Both sentence pairs are kept in the combined data to build the translation model . |
Related Work | More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names. |