Index of papers in Proc. ACL that mention
  • translation model
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Introduction
Section 3 introduces two probabilistic models for integrating translations and transliterations into a translation model which are based on conditional and joint probability distributions.
Our Approach
Both of our models combine a character-based transliteration model with a word-based translation model .
Our Approach
Language Model for Unknown Words: Our model generates transliterations that can be known or unknown to the language model and the translation model .
Our Approach
We refer to the words known to the language model and to the translation model as LM-known and TM-known words respectively and to words that are unknown as LM-unknown and TM-unknown respectively.
Previous Work
Moreover, they are working with a large bitext so they can rely on their translation model and only need to transliterate NEs and OOVs.
Previous Work
Our translation model is based on data which is both sparse and noisy.
translation model is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Nuhn, Malte and Mauser, Arne and Ney, Hermann
Introduction
In this work, we attempt to learn statistical translation models from only monolingual data in the source and target language.
Introduction
This work is a big step towards large-scale and large-vocabulary unsupervised training of statistical translation models .
Introduction
In this work, we will develop, describe, and evaluate methods for large vocabulary unsupervised learning of machine translation models suitable for real-world tasks.
Related Work
Their best performing approach uses an EM-Algorithm to train a generative word based translation model .
Translation Model
In this section, we describe the statistical training criterion and the translation model that is trained using monolingual data.
Translation Model
As training criterion for the translation model’s parameters 6, Ravi and Knight (2011) suggest
Translation Model
This becomes increasingly difficult with more complex translation models .
translation model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Abstract
Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain.
Baselines
Log-linear translation model (TM) mixtures are of the form:
Ensemble Decoding
Given a number of translation models which are already trained and tuned, the ensemble decoder uses hypotheses constructed from all of the models in order to translate a sentence.
Introduction
Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems: the language model and the translation model .
Introduction
translation model adaptation, because various measures such as perplexity of adapted language models can be easily computed on data in the target domain.
Introduction
It is also easier to obtain monolingual data in the target domain, compared to bilingual data which is required for translation model adaptation.
translation model is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
This paper proposes a new discriminative training method in constructing phrase and lexicon translation models .
Abstract
parameters in the phrase and lexicon translation models are estimated by relative frequency or maximizing joint likelihood, which may not correspond closely to the translation measure, e.g., bilingual evaluation understudy (BLEU) (Papineni et al., 2002).
Abstract
However, the number of parameters in common phrase and lexicon translation models is much larger.
translation model is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Abstract
In this paper, we propose two discriminative, feature-based models to exploit predicate-argument structures for statistical machine translation: 1) a predicate translation model and 2) an argument reordering model.
Abstract
The predicate translation model explores lexical and semantic contexts surrounding a verbal predicate to select desirable translations for the predicate.
Introduction
This suggests that conventional leXical and phrasal translation models adopted in those SMT systems are not sufficient to correctly translate predicates in source sentences.
Introduction
Thus we propose a discriminative, feature-based predicate translation model that captures not only leXical information (i.e., surrounding words) but also high-level semantic contexts to correctly translate predicates.
Introduction
In Section 3 and 4, we will elaborate the proposed predicate translation model and argument reordering model respectively, including details about modeling, features and training procedure.
Predicate Translation Model
In this section, we present the features and the training process of the predicate translation model .
Predicate Translation Model
Following the context-dependent word models in (Berger et al., 1996), we propose a discriminative predicate translation model .
Predicate Translation Model
Given a source sentence which contains N verbal predicates , our predicate translation model Mt can be denoted as
Related Work
Our predicate translation model is also related to previous discriminative leXicon translation models (Berger et al., 1996; Venkatapathy and Bangalore, 2007; Mauser et al., 2009).
Related Work
This will tremendously reduce the amount of training data required, which usually is a problem in discriminative leXicon translation models (Mauser et al., 2009).
Related Work
Furthermore, the proposed translation model also differs from previous leXicon translation models in that we use both leXical and semantic features.
translation model is mentioned in 29 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Abstract
We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from nonparallel text.
Introduction
From these corpora, we estimate translation model parameters: word-to-word translation tables, fertilities, distortion parameters, phrase tables, syntactic transformations, etc.
Introduction
In this paper, we address the problem of learning a fall translation model from nonparallel data, and we use the
Introduction
How can we learn a translation model from nonparallel data?
Machine Translation as a Decipherment Task
Probabilistic decipherment: Unlike parallel training, here we have to estimate the translation model P9( f |e) parameters using only monolingual data.
Machine Translation as a Decipherment Task
We then estimate parameters of the translation model P9( f |e) during training.
Machine Translation as a Decipherment Task
EM Decipherment: We propose a new translation model for MT decipherment which can be efficiently trained using the EM algorithm.
translation model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Abstract
additive neural networks, for SMT to go beyond the log-linear translation model .
Abstract
Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and J apanese-to-English translation tasks.
Introduction
On the one hand, features are required to be linear with respect to the objective of the translation model (Nguyen et al., 2007), but it is not guaranteed that the potential features be linear with the model.
Introduction
In the search procedure, frequent computation of the model score is needed for the search heuristic function, which will be challenged by the decoding efficiency for the neural network based translation model .
Introduction
The biggest contribution of this paper is that it goes beyond the log-linear model and proposes a nonlinear translation model instead of re-ranking model (Duh and Kirchhoff, 2008; Sokolov et al., 2012).
translation model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Abstract
Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words) as side-information used for estimating translation models .
Decipherment Model for Machine Translation
Contrary to standard machine translation training scenarios, here we have to estimate the translation model P9( f |e) parameters using only monolingual data.
Decipherment Model for Machine Translation
We then estimate the parameters of the translation model P9 (f le) during training.
Decipherment Model for Machine Translation
Translation Model : Machine translation is a much more complex task than solving other decipherment tasks such as word substitution ciphers (Ravi and Knight, 2011b; Dou and Knight, 2012).
Introduction
The parallel corpora are used to estimate translation model parameters involving word-to-word translation tables, fertilities, distortion, phrase translations, syntactic transformations, etc.
Introduction
Learning translation models from monolingual corpora could help address the challenges faced by modem-day MT systems, especially for low resource language pairs.
Introduction
Recently, this topic has been receiving increasing attention from researchers and new methods have been proposed to train statistical machine translation models using only monolingual data in the source and target language.
translation model is mentioned in 30 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Models 2.1 Baseline Models
Our second baseline is a factored translation model (Koehn and Hoang, 2007) (called Factored), which used as factors the word, “stem”1 and suffix.
Models 2.1 Baseline Models
performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup), using the hand-built Omorfi morphological analyzer (Pirinen and Lis-tenmaa, 2007), which provided slightly higher BLEU scores than the word-based baseline.
Models 2.1 Baseline Models
For segmented translation models , it cannot be taken for granted that greater linguistic accuracy in segmentation yields improved translation (Chang et al., 2008).
Translation and Morphology
Our main contributions are: 1) the introduction of the notion of segmented translation where we explicitly allow phrase pairs that can end with a dangling morpheme, which can connect with other morphemes as part of the translation process, and 2) the use of a fully segmented translation model in combination with a postprocessing morpheme prediction system, using unsupervised morphology induction.
Translation and Morphology
Morphology can express both content and function categories, and our experiments show that it is important to use morphology both within the translation model (for morphology with content) and outside it (for morphology contributing to fluency).
translation model is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Abstract
We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time.
Abstract
Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1 BLEU over unadapted systems and single-domain adaptation.
Introduction
We introduce a translation model architecture that delays the computation of features to the decoding phase.
Related Work
(Ortiz-Martinez et al., 2010) delay the computation of translation model features for the purpose of interactive machine translation with online training.
Related Work
(Sennrich, 2012b) perform instance weighting of translation models , based on the sufficient statistics.
Related Work
(Razmara et al., 2012) describe an ensemble decoding framework which combines several translation models in the decoding step.
Translation Model Architecture
This section covers the architecture of the multi-domain translation model framework.
Translation Model Architecture
Our translation model is embedded in a log-linear model as is common for SMT, and treated as a single translation model in this log-linear combination.
Translation Model Architecture
The architecture has two goals: move the calculation of translation model features to the decoding phase, and allow for multiple knowledge sources (e. g. bitexts or user-provided data) to contribute to their calculation.
translation model is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Abstract
State-of-the-art approaches address these issues by implicitly expanding the queried questions with additional words or phrases using monolingual translation models .
Experiments
Row 3 and row 6 are monolingual translation models to address the word mismatch problem and obtain the state-of-the-art performance in previous work.
Experiments
Row 3 is the word-based translation model (Jeon et al., 2005), and row 4 is the word-based translation language model, which linearly combines the word-based translation model and language model into a unified framework (Xue et al., 2008).
Experiments
Row 5 is the phrase-based translation model , which translates a sequence of words as whole (Zhou et al., 2011).
Introduction
Researchers have proposed the use of word-based translation models (Berger et al., 2000; Jeon et al., 2005; Xue et al., 2008; Lee et al., 2008; Bernhard and Gurevych, 2009) to solve the word mismatch problem.
Introduction
As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM) for question retrieval.
Introduction
(2011) proposed the phrase-based translation models for question and answer retrieval.
translation model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Conclusion and Future Work
Experimental results show that our approach is promising for SMT systems to learn a better translation model .
Experiments
Translation models are trained over the parallel data that is automatically word-aligned
Experiments
This implementation makes the system perform much better and the translation model size is much smaller.
Introduction
Current translation modeling approaches usually use context dependent information to disambiguate translation candidates.
Introduction
Therefore, it is important to leverage topic information to learn smarter translation models and achieve better translation performance.
Introduction
Attempts on topic-based translation modeling include topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007), topic similarity models for synchronous rules (Xiao et al., 2012), and document-level translation with topic coherence (Xiong and Zhang, 2013).
Related Work
Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models , where the topic information of synchronous rules was directly inferred with the help of document-level information.
Related Work
They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
Related Work
They estimated phrase-topic distributions in translation model adaptation and generated better translation quality.
Topic Similarity Model with Neural Network
Therefore, it helps to train a smarter translation model with the embedded topic information.
Topic Similarity Model with Neural Network
Standard features: Translation model , including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).
translation model is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Zhang, Min and Li, Haizhou and Aw, Aiti and Tan, Chew Lim
Abstract
This paper proposes a forest-based tree sequence to string translation model for syntax-based statistical machine translation, which automatically learns tree sequence to string translation rules from word-aligned source-side-parsed bilingual texts.
Abstract
The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models .
Decoding
4) Decode the translation forest using our translation model and a dynamic search algorithm.
Forest-based tree sequence to string model
3.3 Forest-based tree-sequence to string translation model
Forest-based tree sequence to string model
Given a source forest F and target translation T S as well as word alignment A, our translation model is formulated as:
Introduction
to String Translation Model
Introduction
Section 2 describes related work while section 3 defines our translation model .
Related work
Motivated by the fact that non-syntactic phrases make nontrivial contribution to phrase-based SMT, the tree sequence-based translation model is proposed (Liu et al., 2007; Zhang et al., 2008a) that uses tree sequence as the basic translation unit, rather than using single subtree as in the STSG.
Related work
(2007) propose the tree sequence concept and design a tree sequence to string translation model .
Related work
(2008a) propose a tree sequence-based tree to tree translation model and Zhang et al.
translation model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Bernhard, Delphine and Gurevych, Iryna
Abstract
They can be obtained by training statistical translation models on parallel monolingual corpora, such as question-answer pairs, where answers act as the “source” language and questions as the “target” language.
Abstract
We compare monolingual translation models built from lexical semantic resources with two other kinds of datasets: manually-tagged question reformulations and question-answer pairs.
Introduction
Berger and Lafferty (1999) have formulated a further solution to the lexical gap problem consisting in integrating monolingual statistical translation models in the retrieval process.
Introduction
Monolingual translation models encode statistical word associations which are trained on parallel monolingual corpora.
Introduction
While collection-specific translation models effectively encode statistical word associations for the target document collection, it also introduces a bias in the evaluation and makes it difficult to assess the quality of the translation model per se, independently from a specific task and document collection.
translation model is mentioned in 29 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Abstract
In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation.
Abstract
The proposed sense-based translation model enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers.
Abstract
We test the effectiveness of the proposed sense-based translation model on a large-scale Chinese-to-English translation task.
Introduction
These glosses, used as the sense predictions of their WSD system, are integrated into a word-based SMT system either to substitute for translation candidates of their translation model or to postedit the output of their SMT system.
Introduction
In order to incorporate word senses into SMT, we propose a sense-based translation model that is built on maximum entropy classifiers.
Introduction
We collect training instances from the sense-tagged training data to train the proposed sense-based translation model .
translation model is mentioned in 46 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang
A Skeleton-based Approach to MT 2.1 Skeleton Identification
To compute g(d), we use a linear combination of a skeleton translation model 98196; (d) and a full translation model gfuu (d):
A Skeleton-based Approach to MT 2.1 Skeleton Identification
9(d) = gskel(d) + gfuu(d) (3) where the skeleton translation model handles the translation of the sentence skeleton, while the full translation model is the baseline model and handles the original problem of translating the whole sentence.
A Skeleton-based Approach to MT 2.1 Skeleton Identification
The skeleton translation model focuses on the translation of the sentence skeleton, i.e., the solid (red) rectangles; while the full translation model computes the model score for all those phrase-pairs, i.e., all solid and dashed rectangles.
Abstract
The basic idea is that we translate the key elements of the input sentence using a skeleton translation model, and then cover the remain segments using a full translation model .
Introduction
Note that the source-language structural information has been intensively investigated in recent studies of syntactic translation models .
Introduction
0 We develop a skeleton-based model which divides translation into two sub-models: a skeleton translation model (i.e., translating the key elements) and a full translation model (i.e., translating the remaining source words and generating the complete translation).
translation model is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Zhang, Min and Jiang, Hongfei and Aw, Aiti and Li, Haizhou and Tan, Chew Lim and Li, Sheng
Abstract
This paper presents a translation model that is based on tree sequence alignment, where a tree sequence refers to a single sequence of sub-trees that covers a phrase.
Introduction
3d Tree-to-Tree Translation Model
Introduction
In this paper, we propose a tree-to-tree translation model that is based on tree sequence alignment.
Related Work
Ding and Palmer (2005) propose a syntax-based translation model based on a probabilistic synchronous dependency insertion grammar.
Related Work
(2005) propose a dependency treelet-based translation model .
Related Work
(2007b) present a STSG-based tree-to-tree translation model .
Tree Sequence Alignment Model
3.2 Tree Sequence Translation Model
Tree Sequence Alignment Model
sequence-to-tree sequence translation model is formulated as:
translation model is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
P, Deepak and Visweswariah, Karthik
Abstract
We use translation models and language models to exploit lexical correlations and solution post character respectively.
Introduction
We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively.
Our Approach
The usage of translation models in QA retrieval (Xue et al., 2008; Singh, 2012) and segmentation (Deepak et al., 2012) were also motivated by the correlation assumption.
Our Approach
We use an IBM Model 1 translation model (Brown et al., 1990) in our technique; simplistically, such a model m may be thought of as a 2-d associative array where the value m[w1] [mg] is directly related to the probability of wl occuring in the problem when ’LU2 occurs in the solution.
Our Approach
Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions.
Related Work
Usage of translation models for modeling the correlation between textual problems and solutions have been explored earlier starting from the answer retrieval work in (Xue et al., 2008) where new queries were conceptually expanded using the translation model to improve retrieval.
Related Work
Translation models were also seen to be useful in segmenting incident reports into the problem and solution parts (Deepak et al., 2012); we will use an adaptation of the generative model presented therein, for our solution extraction formulation.
Related Work
Entity-level translation models
translation model is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model .
Abstract
In this paper, we study and experiment with novel methods that apply translation models into domain-relevant data selection.
Introduction
The corpora are necessary priori knowledge for training effective translation model .
Introduction
However, domain-specific machine translation has few parallel corpora for translation model training in the domain of interest.
Introduction
To overcome the problem, we first propose the method combining translation model with language model in data selection.
Related Work
Thus, we propose novel methods which are based on translation model and language model for data selection.
Training Data Selection Methods
We present three data selection methods for ranking and selecting domain-relevant sentence pairs from general-domain corpus, with an eye towards improving domain-specific translation model performance.
Training Data Selection Methods
These methods are based on language model and translation model , which are trained on small in-domain parallel data.
Training Data Selection Methods
3.1 Data Selection with Translation Model
translation model is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Experiments
For the synthetic method, we used the ES translation model to translate the English part of the CE corpus to Spanish to construct a synthetic corpus.
Experiments
And we also used the BTEC CEl corpus to build a EC translation model to translate the English part of ES corpus into Chinese.
Experiments
Then we combined these two synthetic corpora to build a Chinese-Spanish translation model .
Introduction
It multiples corresponding translation probabilities and lexical weights in source-pivot and pivot-target translation models to induce a new source-target phrase table.
Introduction
For example, we can obtain a source-pivot corpus by translating the pivot sentence in the source-pivot corpus into the target language with pivot-target translation models .
Pivot Methods for Phrase-based SMT
Following the method described in Wu and Wang (2007), we train the source-pivot and pivot-target translation models using the source-pivot and pivot-target corpora, respectively.
Pivot Methods for Phrase-based SMT
Based on these two models, we induce a source-target translation model , in which two important elements need to be induced: phrase translation probability and lexical weight.
Pivot Methods for Phrase-based SMT
Then we build a source-target translation model using this corpus.
Using RBMT Systems for Pivot Translation
The source-target pairs extracted from this synthetic multilingual corpus can be used to build a source-target translation model .
translation model is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun
Abstract
Current SMT systems usually decode with single translation models and cannot benefit from the strengths of other models in decoding phase.
Abstract
We instead propose joint decoding, a method that combines multiple translation models in one decoder.
Conclusion
We have presented a framework for including multiple translation models in one decoder.
Introduction
In this paper, we propose a framework for combining multiple translation models directly in de-
Joint Decoding
Second, translation models differ in decoding algorithms.
Joint Decoding
Despite the diversity of translation models , they all have to produce partial translations for substrings of input sentences.
Joint Decoding
Therefore, we represent the search space of a translation model as a structure called translation hypergraph.
translation model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Abstract
In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model .
Experiments and Results
The baseline translation models are generated by Moses with default parameter settings.
Input Features for DNN Feature Learning
The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model .
Input Features for DNN Feature Learning
2This corpus is used to train the translation model in our experiments, and we will describe it in detail in section 5.1.
Introduction
al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model .
Related Work
(2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
Related Work
Kalchbrenner and Blunsom (2013) introduced recurrent continuous translation models that comprise a class for purely continuous sentence-level translation models .
Related Work
(2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
translation model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Conclusion and Future Work
In this paper, we have presented a unified reordering framework to incorporate soft linguistic constraints (of syntactic or semantic nature) into the HPB translation model .
Experiments
Our basic baseline system employs 19 basic features: a language model feature, 7 translation model features, word penalty, unknown word penalty, the glue rule, date, number and 6 pass-through features.
HPB Translation Model: an Overview
Each such rule is associated with a set of translation model features {gbi}, such as phrase translation probability p (04 | 7) and its inverse p (7 | 04), the lexical translation probability plew (04 | 7) and its inverse plew (7 | 04), and a rule penalty that affects preference for longer or shorter derivations.
Introduction
The popular distortion or lexicalized reordering models in phrase-based SMT make good local predictions by focusing on reordering on word level, while the synchronous context free grammars in hierarchical phrase-based (HPB) translation models are capable of handling nonlocal reordering on the translation phrase level.
Introduction
The general ideas, however, are applicable to other translation models , e.g., phrase-based model, as well.
Introduction
Section 2 provides an overview of HPB translation model .
Related Work
Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding.
Related Work
In the soft constraint or reordering model approach, Liu and Gildea (2010) modeled the reordering/deletion of source-side semantic roles in a tree-to-string translation model .
Unified Linguistic Reordering Models
For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq.
Unified Linguistic Reordering Models
For the semantic reordering models, we also add two new features into the log-linear translation model .
translation model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
A semantic span can include one or more eus.
1) CSS-based translation model : following formula (1), we obtain the cohesion information by modifying the translation rules with their probabilities P(es Ift) based on word align-
A semantic span can include one or more eus.
3.1 CSS-based Translation Model
A semantic span can include one or more eus.
For the existing translation models , the entire training process is conducted at the lexical or syntactic level without grammatically cohesive information.
Abstract
Our models include a CSS-based translation model , which generates new CSS-based translation rules, and a generative transfer model, which encourages producing transitional expressions during decoding.
Conclusion
In the future, we will extend our methods to other translation models , such as the syntax-based model, to study how to further improve the performance of SMT systems.
Experiments
The bilingual training data for translation model and CSS-based transfer model is FBIS corpus with approximately 7.1 million Chinese words and 9.2 million English words.
Experiments
For this work, we use an in-house decoder to build the SMT baseline; it combines the hierarchical phrase-based translation model (Chiang, 2005; Chiang, 2007) with the BTG (Wu, 1996) reordering model (Xiong et al., 2006; Zens and Ney, 2006; He et al., 2010).
Experiments
To further evaluate the effectiveness of the proposed models, we also conducted an experiment on a larger set of bilingual training data from the LDC corpus7 for translation model and transfer model.
Introduction
One is a new translation model that is utilized to generate new translation rules combined with the information of source functional relationships.
translation model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Abstract
We present an adaptive translation quality estimation (QE) method to predict the human-targeted translation error rate (HTER) for a document-specific machine translation model .
Adaptive MT Quality Estimation
Therefore it is necessary to build a QB regression model that’s robust to different document-specific translation models .
Document-specific MT System
Building a general MT system using all the parallel data not only produces a huge translation model (unless with very aggressive pruning), the performance on the given input document is suboptimal due to the unwanted dominance of out-of-domain data.
Document-specific MT System
Here we adopt the same strategy, building a document-specific translation model for each input document.
Experiments
In a typical MT QE scenario, the QE model is pre-trained and applied to various MT outputs, even though the QE training data and MT outputs are generated from different translation models .
Experiments
We train the static QE model with this training set, including the source sentences, references and MT outputs (from multiple translation models ).
Experiments
To train the adaptive QE model for each test document, we build a translation model whose subsampling data includes source sentences from both the test document and the QE training data.
Introduction
First, existing approaches to MT quality estimation rely on lexical and syntactical features defined over parallel sentence pairs, which includes source sentences, MT outputs and references, and translation models (Blatz et al., 2004; Ueffing and Ney, 2007; Specia et al., 2009a; Xiong et al., 2010; Soricut and Echihabi, 2010a; Bach et al., 2011).
Static MT Quality Estimation
derived from a Maximum Entropy translation model (Ittycheriah and Roukos, 2005).
translation model is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen
Introduction
Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model , which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al.,
Related Work
The translation models in these techniques define phrases as contiguous word sequences (with gaps allowed in the case of hierarchical phrases) which may or may not correspond to any linguistic constituent.
Related Work
Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality.
Related Work
Significant research has examined the extent to which syntax can be usefully incorporated into statistical tree-based translation models: string-to-tree (Yamada and Knight, 2001; Gildea, 2003; Imamura et al., 2004; Galley et al., 2004; Graehl and Knight, 2004; Melamed, 2004; Galley et al., 2006; Huang et al., 2006; Shen et al., 2008), tree-to-string (Liu et al., 2006; Liu et al., 2007; Mi et al., 2008; Mi and Huang, 2008; Huang and Mi, 2010), tree-to-tree (Abeille et al., 1990; Shieber and Schabes, 1990; Poutsma, 1998; Eisner, 2003; Shieber, 2004; Cowan et al., 2006; Nesson et al., 2006; Zhang et al., 2007; DeNeefe et al., 2007; DeNeefe and Knight, 2009; Liu et al., 2009; Chiang, 2010), and treelet (Ding and Palmer, 2005; Quirk et al., 2005) techniques use syntactic information to inform the translation model .
translation model is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Abstract
The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of sub-trees.
Abstract
This paper goes further to present a translation model based on noncontiguous tree sequence alignment, where a noncontiguous tree sequence is a sequence of sub-trees and gaps.
Experiments
In the experiments, we train the translation model on FBIS corpus (7.2M (Chinese) + 9.2M (English) words) and train a 4-gram language model on the Xinhua portion of the English Gigaword corpus (181M words) using the SRILM Toolkits (Stolcke,
Introduction
Bod (2007) also finds that discontinues phrasal rules make significant improvement in linguistically motivated STSG-based translation model .
Introduction
2 We illustrate the rule extraction with an example from the tree-to-tree translation model based on tree sequence alignment (Zhang et al, 2008a) without losing of generality to most syntactic tree based models.
Introduction
To address this issue, we propose a syntactic translation model based on noncontiguous tree sequence alignment.
NonContiguous Tree sequence Align-ment-based Model
In this section, we give a formal definition of SncTSSG and accordingly we propose the alignment based translation model .
NonContiguous Tree sequence Align-ment-based Model
2.2 SncTSSG based Translation Model
translation model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi
Eliciting Addressee’s Emotion
Following (Ritter et al., 2011), we apply the statistical machine translation model for generating a response to a given utterance.
Eliciting Addressee’s Emotion
We use GIZA++8 and SRILM9 for learning translation model and 5-gram language model, re-
Eliciting Addressee’s Emotion
We use the emotion-tagged dialogue corpus to learn eight translation models and language models, each of which is specialized in generating the response that elicits one of the eight emotions (Plutchik, 1980).
Experiments
Table 6: The number of utterance pairs used for training classifiers in emotion prediction and learning the translation models and language models in response generation.
Experiments
We use the utterance pairs summarized in Table 6 to learn the translation models and language models for eliciting each emotional category.
Experiments
However, for learning the general translation models , we currently use 4 millions of utterance pairs sampled from the 640 millions of pairs due to the computational limitation.
translation model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo
Approach
supervised IR models, the answer ranking is implemented using discriminative learning, and finally, some of the ranking features are produced by question-to-answer translation models , which use class-conditional learning.
Approach
the similarity between questions and answers (FGl), features that encode question-to-answer transformations using a translation model (FG2), features that measure keyword density and frequency (FG3), and features that measure the correlation between question-answer pairs and other collections (FG4).
Approach
One way to address this problem is to learn question-to-answer transformations using a translation model (Berger et al., 2000; Echihabi and Marcu, 2003; Soricut and Brill, 2006; Riezler et al., 2007).
Experiments
Our ranking model was tuned strictly on the development set (i.e., feature selection and parameters of the translation models ).
Experiments
to improve lexical matching and translation models .
Experiments
This indicates that, even though translation models are the most useful, it is worth exploring approaches that combine several strategies for answer ranking.
Related Work
In the QA literature, answer ranking for non-factoid questions has typically been performed by learning question-to-answer transformations, either using translation models (Berger et al., 2000; Soricut and Brill, 2006) or by exploiting the redundancy of the Web (Agichtein et al., 2001).
Related Work
On the other hand, our approach allows the learning of full transformations from question structures to answer structures using translation models applied to different text representations.
translation model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Liu, Kang
Experiments
In order to evaluate the influence of segmentation results upon the statistical ON translation system, we compare the results of two translation models .
Experiments
For constructing a statistical ON translation model , we use GIZA++I to align the Chinese NEs and the English NEs in the training set.
Experiments
Q2: the Chinese ON and the results of the statistical translation model .
Related Work
The first type of methods translates ONs by building a statistical translation model .
Related Work
The statistical translation model can give an output for any input.
The Chunking-based Segmentation for Chinese ONs
The performance of the statistical ON translation model is dependent on the precision of the Chinese ON segmentation to some extent.
translation model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Foster, George and Kuhn, Roland
Abstract
Similarity scores are used as additional features of the translation model to improve translation performance.
Experiments
The sense similarity scores are used as feature functions in the translation model .
Experiments
In particular, all the allowed bilingual corpora except the UN corpus and Hong Kong Hansard corpus have been used for estimating the translation model .
Experiments
The second one is the small data condition where only the FBIS3 corpus is used to train the translation model .
Hierarchical phrase-based MT system
The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG).
Introduction
the translation probabilities in a translation model , for units from parallel corpora are mainly based on the co-occurrence counts of the two units.
translation model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and DeNero, John
A Class-based Model of Agreement
Translation Model 6 Target sequence of I words f Source sequence of J words a Sequence of K phrase alignments for (e, f) H Permutation of the alignments for target word order 6 h Sequence of M feature functions A Sequence of learned weights for the M features H A priority queue of hypotheses
Discussion of Translation Results
This large gap between the unigram recall of the actual translation output (top) and the lexical coverage of the phrase-based model (bottom) indicates that translation performance can be improved dramatically by altering the translation model through features such as ours, without expanding the search space of the decoder.
Experiments
We trained the translation model on 502 million words of parallel text collected from a variety of sources, including the Web.
Inference during Translation Decoding
3.3 Translation Model Features
Introduction
However, using lexical coverage experiments, we show that there is ample room for translation quality improvements through better selection of forms that already exist in the translation model .
Related Work
Factored Translation Models Factored translation models (Koehn and Hoang, 2007) facilitate a more data-oriented approach to agreement modeling.
Related Work
Subotin (2011) recently extended factored translation models to hierarchical phrase-based translation and developed a discriminative model for predicting target-side morphology in English-Czech.
translation model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Vaswani, Ashish and Huang, Liang and Chiang, David
Abstract
Two decades after their invention, the IBM word-based translation models , widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems.
Abstract
In this paper, we propose a simple extension to the IBM models: an 60 prior to encourage sparsity in the word-to-word translation model .
Conclusion
We have extended the IBM models and HMM model by the addition of an (0 prior to the word-to-word translation model , which compacts the word-to-word translation table, reducing overfitting, and, in particular, the “garbage collection” effect.
Experiments
Table 4 shows B scores for translation models learned from these alignments.
Introduction
Although state-of—the-art translation models use rules that operate on units bigger than words (like phrases or tree fragments), they nearly always use word alignments to drive extraction of those translation rules.
Introduction
It extends the IBM/HMM models by incorporating an (0 prior, inspired by the principle of minimum description length (Barron et al., 1998), to encourage sparsity in the word-to-word translation model (Section 2.2).
translation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Conclusion and Future Work
Finally, we hope to apply our method to other translation models , especially syntax-based models.
Decoding
In the topic-specific lexicon translation model , given a source document, it first calculates the topic-specific translation probability by normalizing the entire lexicon translation table, and then adapts the lexical weights of rules correspondingly.
Experiments
The adapted lexicon translation model is added as a new feature under the discriminative framework.
Introduction
To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality.
Introduction
Topic-specific lexicon translation models focus on word-level translations.
Related Work
combine a specific domain translation model with a general domain translation model depending on various text distances.
translation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Avramidis, Eleftherios and Koehn, Philipp
Conclusion
Opposed to other factored translation model approaches that require target language factors, that are not easily obtainable for many languages, our approach only requires English syntax trees, which are acquired with widely available automatic parsers.
Factored Model
The factored statistical machine translation model uses a log-linear approach, in order to combine the several components, including the language model, the reordering model, the translation models and the generation models.
Introduction
Our method is based on factored phrase-based statistical machine translation models .
Introduction
Traditional statistical machine translation models deal with this problems in two ways:
Introduction
Then, contrary to the methods that added only output features or altered the generation procedure, we used this information in order to augment only the source side of a factored translation model , assuming that we do not have resources allowing factors or specialized generation in the target language (a common problem, when translating from English into under-resourced languages).
Methods for enriching input
Considering such annotation, a factored translation model is trained to map the word-case pair to the correct inflection of the target noun.
translation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef
Conclusion
We present a head-driven hierarchical phrase-based (HD-HPB) translation model , which adopts head information (derived through unlabeled dependency analysis) in the definition of non-terminals to better differentiate among translation rules.
Head-Driven HPB Translation Model
Like Chiang (2005) and Chiang (2007), our HD-HPB translation model adopts a synchronous context free grammar, a rewriting system which generates source and target side string pairs simultaneously using a context-free grammar.
Head-Driven HPB Translation Model
For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007).
Head-Driven HPB Translation Model
Merging two neighboring non-terminals into a single nonterminal, NRRs enable the translation model to explore a wider search space.
Introduction
Chiang’s hierarchical phrase-based (HPB) translation model utilizes synchronous context free grammar (SCFG) for translation derivation (Chiang, 2005; Chiang, 2007) and has been widely adopted in statistical machine translation (SMT).
Introduction
However, the two approaches are not mutually exclusive, as we could also include a set of syntax-driven features into our translation model .
translation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Alignment
, N is used for both the initialization of the translation model p(f|é) and the phrase model training.
Conclusion
We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model .
Experimental Evaluation
The scaling factors of the translation models have been optimized for BLEU on the DEV data.
Experimental Evaluation
We will focus on the proposed leaving-one-out technique and show that it helps in finding good phrasal alignments on the training data that lead to improved translation models .
Introduction
Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
Related Work
This is different from word-based translation models , where a typical assumption is that each target word corresponds to only one source word.
translation model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Introduction
The translation quality of the SMT system is highly related to the coverage of translation models .
Introduction
Naturally, a solution to the coverage problem is to bridge the gaps between the input sentences and the translation models , either from the input side, which targets on rewriting the input sentences to the MT-favored expressions, or from
Introduction
the side of translation models, which tries to enrich the translation models to cover more expressions.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Abstract
We present a translation model which models derivations as a latent variable, in both training and decoding, and is fully discriminative and globally optimised.
Discriminative Synchronous Transduction
Our log-linear translation model defines a conditional probability distribution over the target translations of a given source sentence.
Evaluation
6We also experimented with using max-translation decoding for standard MER trained translation models , finding that it had a small negative impact on BLEU score.
Evaluation
Firstly we show the relative scores of our model against Hiero without using reverse translation or lexical features.7 This allows us to directly study the differences between the two translation models without the added complication of the other features.
Evaluation
As expected, the language model makes a significant difference to BLEU, however we believe that this effect is orthogonal to the choice of base translation model , thus we would expect a similar gain when integrating a language model into the discriminative system.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Integration of inflection models with MT systems
In this method, stemming can impact word alignment in addition to the translation models .
MT performance results
From the results of Method 2 we can see that reducing sparsity at translation modeling is advantageous.
MT performance results
From these results, we also see that about half of the gain from using stemming in the base MT system came from improving word alignment, and half came from using translation models operating at the less sparse stem level.
Machine translation systems and data
The treelet translation model is estimated using a parallel corpus.
Related work
Though our motivation is similar to that of Koehn and Hoang (2007), we chose to build an independent component for inflection prediction in isolation rather than folding morphological information into the main translation model .
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Introduction
the role of the translation model in Statistical Machine Translation (SMT).
System
The language model is a trigram-based back-off language model with Kneser-Ney smoothing, computed using SRILM (Stolcke, 2002) and trained on the same training data as the translation model .
System
We do so by normalising the class probability from the classifier (scoreT(H)), which is our translation model , and the language model (scorelm(H)), in such a way that the highest classifier score for the alternatives under consideration is always 1.0, and the highest language model score of the sentence is always 1.0.
System
If desired, the search can be parametrised with variables A3 and A4, representing the weights we want to attach to the classifier-based translation model and the language model, respectively.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Conclusion
Eventually, we would like to replace the functionality of factored translation models (Koehn and Hoang, 2007) with lattice transformation and augmentation.
Experimental Setup
Four translation model features encode phrase translation probabilities and lexical scores in both directions.
Introduction
Morphological complexity leads to much higher type to token ratios than English, which can create sparsity problems during translation model estimation.
Related Work
Most techniques approach the problem by transforming the target language in some manner before training the translation model .
Related Work
In this setting, the sparsity reduction from segmentation helps word alignment and target language modeling, but it does not result in a more expressive translation model .
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Narayan, Shashi and Gardent, Claire
Abstract
Second, it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering.
Experiments
We trained our simplification and translation models on the PWKP corpus.
Simplification Framework
Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality.
Simplification Framework
where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively.
Simplification Framework
Our phrase based translation model is trained using the Moses toolkit5 with its default command line options on the PWKP corpus (except the sentences from the test set) considering the complex sentence as the source and the simpler one as the target.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Abstract
We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model , which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality.
Conclusion and Future Work
In this paper, we presented a novel forest-based constituency-to-dependency translation model , which combines the advantages of both tree-to-string and string-to-tree systems, runs fast and guarantees grammaticality of the output.
Introduction
Linguistically syntax-based statistical machine translation models have made promising progress in recent years.
Model
Figure 1 shows a word-aligned source constituency forest FC and target dependency tree De, our constituency to dependency translation model can be formalized as:
Related Work
(2009), we apply forest into a new constituency tree to dependency tree translation model rather than constituency tree-to-tree model.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Maletti, Andreas
Introduction
A (formal) translation model is at the core of every machine translation system.
Introduction
(1990) discuss automatically trainable translation models in their seminal paper.
Introduction
Contrary, in the field of syntax-based machine translation, the translation models have full access to the syntax of the sentences and can base their decision on it.
Preservation of regularity
Preservation of regularity is an important property for a number of translation model manipulations.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Abstract
Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model.
Introduction
Phrase-based and tree-based translation model are the two main streams in state-of-the-art machine translation.
Introduction
The tree-based translation model , by using a synchronous context-free grammar formalism, can capture longer reordering between source and target language.
Introduction
Many features are shared between phrase-based and tree-based systems including language model, word count, and translation model features.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mylonakis, Markos and Sima'an, Khalil
Experiments
The induced joint translation model can be used to recover arg maxe p(e|f), as it is equal to arg maxe p(e, f We employ the induced probabilistic HR-SCFG G as the backbone of a log-linear, feature based translation model , with the derivation probability p(D) under the grammar estimate being
Introduction
Nevertheless, the successful employment of SCFGs for phrase-based SMT brought translation models assuming latent syntactic structure to the spotlight.
Introduction
Section 2 discusses the weak independence assumptions of SCFGs and introduces a joint translation model which addresses these issues and separates hierarchical translation structure from phrase-pair emission.
Joint Translation Model
The rest of the (sometimes thousands of) rule-specific features usually added to SCFG translation models do not directly help either, leaving reordering decisions disconnected from the rest of the derivation.
Related Work
Most of the aforementioned work does concentrate on learning hierarchical, linguistically motivated translation models .
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Boyd-Graber, Jordan and Resnik, Philip
Abstract
We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorporate them into our translation model as features.
Discussion and Conclusion
We can construct a topic model once on the training data, and use it infer topics on any test set to adapt the translation model .
Introduction
This problem has led to a substantial amount of recent work in trying to bias, or adapt, the translation model (TM) toward particular domains of interest (Axelrod et al., 2011; Foster et al., 2010; Snover et al., 2008).1 The intuition behind TM adaptation is to increase the likelihood of selecting relevant phrases for translation.
Introduction
We induce unsupervised domains from large corpora, and we incorporate soft, probabilistic domain membership into a translation model .
Introduction
We accomplish this by introducing topic dependent lexical probabilities directly as features in the translation model , and interpolating them log-linearly with our other features, thus allowing us to discriminatively optimize their weights on an arbitrary objective function.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas
Abstract
We present a new translation model integrating the shallow local multi bottom-up tree transducer.
Experiments
Both translation models were trained with approximately 1.5 million bilingual sentences after length-ratio filtering.
Introduction
In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model .
Introduction
The theoretical foundations of [MBOT and their integration into our translation model are presented in Sections 2 and 3.
Translation Model
Given a source language sentence 6, our translation model aims to find the best corresponding target language translation §;7 i.e.,
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao
Experiment
The translation model was trained using sentences of 40 words or less from the training data.
Experiment
The common SMT feature set consists of: four translation model features, phrase penalty, word penalty, and a language model feature.
Experiment
Our distortion model was trained as follows: We used 0.2 million sentence pairs and their word alignments from the data used to build the translation model as the training data for our distortion models.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Fangtao and Gao, Yang and Zhou, Shuchang and Si, Xiance and Dai, Decheng
Experiments
The larger question threads data is employed for feature learning, such as translation model , and topic model training.
Proposed Features
Translation Model
Proposed Features
A translation model is a mathematical model in which the language translation is modeled in a statistical way.
Proposed Features
We train a translation model (Brown et al., 1990; Och and Ney, 2003) using the Community QA data, with the question as the target language, and the corresponding best answer as the source language.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Comparative Study
(Marifio et al., 2006) implement a translation model using n-grams.
Experiments
Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model (LM), a phrase translation model and a word-based lexicon model.
Experiments
Table 1: translation model and LM training data statistics
Experiments
Table 1 contains the data statistics used for translation model and LM.
Introduction
(Marifio et al., 2006) present a translation model that constitutes a language model of a sort of bilanguage composed of bilingual units.
translation model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Haffari, Gholamreza
Abstract
Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora.
Experiments
In the end-to-end MT pipeline we use a standard set of features: relative-frequency and lexical translation model probabilities in both directions; distance-based distortion model; language model and word count.
Introduction
Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data, from
Introduction
This paper develops a phrase-based translation model which aims to address the above shortcomings of the phrase-based translation pipeline.
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Computing Feature Expectations
The weight of h is the incremental score contributed to all translations containing the rule application, including translation model features on 7“ and language model features that depend on both 7“ and the English contexts of the child nodes.
Experimental Results
Hiero is a hierarchical system that expresses its translation model as a synchronous context-free grammar (Chiang, 2007).
Experimental Results
SBMT is a string-to-tree translation system with rich target-side syntactic information encoded in the translation model .
Experimental Results
Figure 3: N - grams with high expected count are more likely to appear in the reference translation that 71- grams in the translation model’s Viterbi translation, 6*.
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Foster, George
Experiments
The translation model (TM) was smoothed in both directions with KN smoothing (Chen et al., 2011).
Experiments
In (Foster and Kuhn, 2007), two kinds of linear mixture were described: linear mixture of language models (LMs), and linear mixture of translation models (TMs).
Introduction
The translation models of a statistical machine translation (SMT) system are trained on parallel data.
Introduction
The 2012 JHU workshop on Domain Adaptation for MT 1 proposed phrase sense disambiguation (PSD) for translation model adaptation.
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Conclusion
In this work, we presented an approach that can expand a translation model extracted from a sentence-aligned, bilingual corpus using a large amount of unstructured, monolingual data in both source and target languages, which leads to improvements of 1.4 and 1.2 BLEU points over strong baselines on evaluation sets, and in some scenarios gains in excess of 4 BLEU points.
Generation & Propagation
We assume that sufficient parallel resources exist to learn a basic translation model using standard techniques, and also assume the availability of larger monolingual corpora in both the source and target languages.
Introduction
Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams, since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text.
Related Work
(2013) and Irvine and Callison-Burch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams, and not on enriching the entire translation model .
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Expected BLEU Training
We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1):
Expected BLEU Training
The translation model is parameterized by A and 6 which are learned as follows (Gao et al., 2014):
Experiments
Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings.
Introduction
Neural network-based language and translation models have achieved impressive accuracy improvements on statistical machine translation tasks (Allauzen et al., 2011; Le et al., 2012b; Schwenk et al., 2012; Vaswani et al., 2013; Gao et al., 2014).
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Introduction
In section 3, we briefly describe the translation model with phrasal ITGs and Pitman-Yor process.
Related Work
The translation model and language model are primary components in SMT.
Related Work
In the case of the previous work on translation modeling , mixed methods have been investigated for domain adaptation in SMT by adding domain information as additional labels to the original phrase table (Foster and Kuhn, 2007).
Related Work
Monolingual topic information is taken as a new feature for a domain adaptive translation model and tuned on the development set (Su et al., 2012).
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Abstract
Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora.
Introduction
Novel translation models , such as phrase-based models (Koehn et a., 2007), hierarchical phrase-based models (Chiang, 2007) and linguistically syntax-based models (Liu et a., 2006; Huang et al., 2006; Galley, 2006; Zhang et a1, 2008; Chiang, 2010; Zhang et al., 2011; Zhai et al., 2011, 2012) have been proposed and achieved higher and higher translation performance.
Introduction
However, all of these state-of-the-art translation models rely on the parallel corpora to induce translation rules and estimate the corresponding parameters.
Introduction
Finally, they used the learned translation model directly to translate unseen data (Ravi and Knight, 2011; Nuhn et al., 2012) or incorporated the learned bilingual lexicon as a new in-domain translation resource into the phrase-based model which is trained with out-of-domain data to improve the domain adaptation performance in machine translation (Dou and Knight, 2012).
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine
Introduction
We first replace inflected forms by their stems or lemmas: building a translation system on a stemmed representation of the target side leads to a simpler translation task, and the morphological information contained in the source and target language parts of the translation model is more balanced.
Translation pipeline
We use this representation to create the stemmed representation for training the translation model .
Translation pipeline
In addition to the translation model , the target-side language model, as well as the reference data for parameter tuning use this representation.
Translation pipeline
3.2 Building a stemmed translation model
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Experiments
Accordingly, we use Chiang’s hierarchical phrase based translation model (Chiang, 2007) as a base line, and the syntax-augmented MT model (Zollmann and Venugopal, 2006) as a ‘target line’, a model that would not be applicable for language pairs without linguistic resources.
Hard rule labeling from word classes
Our approach instead uses distinct grammar rules and labels to discriminate phrase size, with the advantage of enabling all translation models to estimate distinct weights for distinct size classes and avoiding the need of additional models in the log-linear framework; however, the increase in the number of labels and thus grammar rules decreases the reliability of estimated models for rare events due to increased data sparseness.
Introduction
Labels on these nonterminal symbols are often used to enforce syntactic constraints in the generation of bilingual sentences and imply conditional independence assumptions in the translation model .
Related work
(2009) present a nonparametric PSCFG translation model that directly induces a grammar from parallel sentences Without the use of or constraints from a word-alignment model, and
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Abstract
We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations.
Discussion and Future Directions
We have presented a novel, incremental topic-based translation model adaptation approach that obeys the causality constraint imposed by spoken conversations.
Incremental Topic-Based Adaptation
Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality.
Incremental Topic-Based Adaptation
We add this feature to the log-linear translation model with its own weight, which is tuned with MERT.
translation model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Abstract
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Experiments
The word-to-Word translation probabilities are from the translation model of IBM Model 4 trained on a 160-million-word English-Chinese parallel corpus using GIZA++.
Introduction
This complexity arises from the interaction of the tree-based translation model with an n-gram language model.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Improving Statistical Bilingual Word Alignment
IBM Model 1 only employs the word translation model to calculate the probabilities of alignments.
Improving Statistical Bilingual Word Alignment
In IBM Model 2, both the word translation model and position distribution model are used.
Improving Statistical Bilingual Word Alignment
IBM Model 3, 4 and 5 consider the fertility model in addition to the word translation model and position distribution model.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei
Introduction
The quality of the parallel data and the word alignment have significant impacts on the learned translation models and ultimately the quality of translation output.
Sentence Alignment Confidence Measure
The source-to-target lexical translation model p(t|s) and target-to-source model p(s|t) can be obtained through IBM Model-l or HMM training.
Sentence Alignment Confidence Measure
For the efficient computation of the denominator, we use the lexical translation model .
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev
Experimental Results
Our translation model was trained on about 1M parallel sentence pairs (about 28M words in each language), which are sub-sampled from corpora distributed by LDC for the NIST MT evaluation using a sampling method based on the n-gram matches between training and test sets in the foreign side.
Experimental Results
We use GIZA++ (Och and Ney, 2000), a suffix-array (Lopez, 2007), SRILM (Stol-cke, 2002), and risk-based deterministic annealing (Smith and Eisner, 2006)17 to obtain word alignments, translation models , language models, and the optimal weights for combining these models, respectively.
Variational vs. Min-Risk Decoding
However, suppose the hypergraph were very large (thanks to a large or smoothed translation model and weak pruning).
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Abstract
RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner.
Introduction
DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment, language modelling, translation modelling and distortion modelling.
Introduction
(2013) propose a joint language and translation model , based on a recurrent neural network.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Aw, Aiti and Li, Haizhou
Analysis
We want to further study the happenings after we integrate the constraint feature (our SDB model and Marton and Resnik’s XP+) into the log-linear translation model .
Experiments
All translation models were trained on the FBIS corpus.
The Syntax-Driven Bracketing Model 3.1 The Model
new feature into the log-linear translation model : PSDB (b|T, This feature is computed by the SDB model described in equation (3) or equation (4), which estimates a probability that a source span is to be translated as a unit within particular syntactic contexts.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Introduction
They have since been extended to translation modeling , parsing, and many other NLP tasks.
Model Variations
4.1 Neural Network Lexical Translation Model (NNLTM)
Model Variations
In order to assign a probability to every source word during decoding, we also train a neural network lexical translation model (NNLMT).
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Conclusion
These tree-to-tree rules are applicable for forest-to-tree translation models (Liu et al., 2009a).
Experiments
4.1 Translation models
Experiments
In our translation models , we have made use of three kinds of translation rule sets which are trained separately.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Rush, Alexander M. and Collins, Michael
Conclusion
We have described a Lagrangian relaxation algorithm for exact decoding of syntactic translation models , and shown that it is significantly more efficient than other exact algorithms for decoding tree-
Conclusion
Our experiments have focused on tree-to-string models, but the method should also apply to Hiero-style syntactic translation models (Chiang, 2007).
Experiments
We use an identical model, and identical development and test data, to that used by Huang and Mi.9 The translation model is trained on 1.5M sentence pairs of Chinese-English data; a trigram language model is used.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nuhn, Malte and Ney, Hermann
Introduction
The decipherment approach for MT has recently gained popularity for training and adapting translation models using only monolingual data.
Introduction
The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language.
Related Work
(Ravi and Knight, 2011), (Nuhn et al., 2012), and (Dou and Knight, 2012) treat natural language translation as a deciphering problem including phenomena like reordering, insertion, and deletion and are able to train translation models using only monolingual data.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Name-aware MT
Some of these names carry special meanings that may influence translations of the neighboring words, and thus replacing them with non-terminals can lead to information loss and weaken the translation model .
Name-aware MT
Both sentence pairs are kept in the combined data to build the translation model .
Related Work
More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names.
translation model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: