Abstract | State-of-the-art approaches address these issues by implicitly expanding the queried questions with additional words or phrases using monolingual translation models . |
Experiments | Row 3 and row 6 are monolingual translation models to address the word mismatch problem and obtain the state-of-the-art performance in previous work. |
Experiments | Row 3 is the word-based translation model (Jeon et al., 2005), and row 4 is the word-based translation language model, which linearly combines the word-based translation model and language model into a unified framework (Xue et al., 2008). |
Experiments | Row 5 is the phrase-based translation model , which translates a sequence of words as whole (Zhou et al., 2011). |
Introduction | Researchers have proposed the use of word-based translation models (Berger et al., 2000; Jeon et al., 2005; Xue et al., 2008; Lee et al., 2008; Bernhard and Gurevych, 2009) to solve the word mismatch problem. |
Introduction | As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM) for question retrieval. |
Introduction | (2011) proposed the phrase-based translation models for question and answer retrieval. |
Abstract | additive neural networks, for SMT to go beyond the log-linear translation model . |
Abstract | Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and J apanese-to-English translation tasks. |
Introduction | On the one hand, features are required to be linear with respect to the objective of the translation model (Nguyen et al., 2007), but it is not guaranteed that the potential features be linear with the model. |
Introduction | In the search procedure, frequent computation of the model score is needed for the search heuristic function, which will be challenged by the decoding efficiency for the neural network based translation model . |
Introduction | The biggest contribution of this paper is that it goes beyond the log-linear model and proposes a nonlinear translation model instead of re-ranking model (Duh and Kirchhoff, 2008; Sokolov et al., 2012). |
Abstract | We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. |
Abstract | Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1 BLEU over unadapted systems and single-domain adaptation. |
Introduction | We introduce a translation model architecture that delays the computation of features to the decoding phase. |
Related Work | (Ortiz-Martinez et al., 2010) delay the computation of translation model features for the purpose of interactive machine translation with online training. |
Related Work | (Sennrich, 2012b) perform instance weighting of translation models , based on the sufficient statistics. |
Related Work | (Razmara et al., 2012) describe an ensemble decoding framework which combines several translation models in the decoding step. |
Translation Model Architecture | This section covers the architecture of the multi-domain translation model framework. |
Translation Model Architecture | Our translation model is embedded in a log-linear model as is common for SMT, and treated as a single translation model in this log-linear combination. |
Translation Model Architecture | The architecture has two goals: move the calculation of translation model features to the decoding phase, and allow for multiple knowledge sources (e. g. bitexts or user-provided data) to contribute to their calculation. |
Abstract | Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words) as side-information used for estimating translation models . |
Decipherment Model for Machine Translation | Contrary to standard machine translation training scenarios, here we have to estimate the translation model P9( f |e) parameters using only monolingual data. |
Decipherment Model for Machine Translation | We then estimate the parameters of the translation model P9 (f le) during training. |
Decipherment Model for Machine Translation | Translation Model : Machine translation is a much more complex task than solving other decipherment tasks such as word substitution ciphers (Ravi and Knight, 2011b; Dou and Knight, 2012). |
Introduction | The parallel corpora are used to estimate translation model parameters involving word-to-word translation tables, fertilities, distortion, phrase translations, syntactic transformations, etc. |
Introduction | Learning translation models from monolingual corpora could help address the challenges faced by modem-day MT systems, especially for low resource language pairs. |
Introduction | Recently, this topic has been receiving increasing attention from researchers and new methods have been proposed to train statistical machine translation models using only monolingual data in the source and target language. |
Eliciting Addressee’s Emotion | Following (Ritter et al., 2011), we apply the statistical machine translation model for generating a response to a given utterance. |
Eliciting Addressee’s Emotion | We use GIZA++8 and SRILM9 for learning translation model and 5-gram language model, re- |
Eliciting Addressee’s Emotion | We use the emotion-tagged dialogue corpus to learn eight translation models and language models, each of which is specialized in generating the response that elicits one of the eight emotions (Plutchik, 1980). |
Experiments | Table 6: The number of utterance pairs used for training classifiers in emotion prediction and learning the translation models and language models in response generation. |
Experiments | We use the utterance pairs summarized in Table 6 to learn the translation models and language models for eliciting each emotional category. |
Experiments | However, for learning the general translation models , we currently use 4 millions of utterance pairs sampled from the 640 millions of pairs due to the computational limitation. |
Abstract | We present a new translation model integrating the shallow local multi bottom-up tree transducer. |
Experiments | Both translation models were trained with approximately 1.5 million bilingual sentences after length-ratio filtering. |
Introduction | In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model . |
Introduction | The theoretical foundations of [MBOT and their integration into our translation model are presented in Sections 2 and 3. |
Translation Model | Given a source language sentence 6, our translation model aims to find the best corresponding target language translation §;7 i.e., |
Comparative Study | (Marifio et al., 2006) implement a translation model using n-grams. |
Experiments | Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model (LM), a phrase translation model and a word-based lexicon model. |
Experiments | Table 1: translation model and LM training data statistics |
Experiments | Table 1 contains the data statistics used for translation model and LM. |
Introduction | (Marifio et al., 2006) present a translation model that constitutes a language model of a sort of bilanguage composed of bilingual units. |
Experiment | The translation model was trained using sentences of 40 words or less from the training data. |
Experiment | The common SMT feature set consists of: four translation model features, phrase penalty, word penalty, and a language model feature. |
Experiment | Our distortion model was trained as follows: We used 0.2 million sentence pairs and their word alignments from the data used to build the translation model as the training data for our distortion models. |
Experiments | The larger question threads data is employed for feature learning, such as translation model , and topic model training. |
Proposed Features | Translation Model |
Proposed Features | A translation model is a mathematical model in which the language translation is modeled in a statistical way. |
Proposed Features | We train a translation model (Brown et al., 1990; Och and Ney, 2003) using the Community QA data, with the question as the target language, and the corresponding best answer as the source language. |
Abstract | Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model. |
Introduction | Phrase-based and tree-based translation model are the two main streams in state-of-the-art machine translation. |
Introduction | The tree-based translation model , by using a synchronous context-free grammar formalism, can capture longer reordering between source and target language. |
Introduction | Many features are shared between phrase-based and tree-based systems including language model, word count, and translation model features. |
Introduction | In section 3, we briefly describe the translation model with phrasal ITGs and Pitman-Yor process. |
Related Work | The translation model and language model are primary components in SMT. |
Related Work | In the case of the previous work on translation modeling , mixed methods have been investigated for domain adaptation in SMT by adding domain information as additional labels to the original phrase table (Foster and Kuhn, 2007). |
Related Work | Monolingual topic information is taken as a new feature for a domain adaptive translation model and tuned on the development set (Su et al., 2012). |
Abstract | Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. |
Introduction | Novel translation models , such as phrase-based models (Koehn et a., 2007), hierarchical phrase-based models (Chiang, 2007) and linguistically syntax-based models (Liu et a., 2006; Huang et al., 2006; Galley, 2006; Zhang et a1, 2008; Chiang, 2010; Zhang et al., 2011; Zhai et al., 2011, 2012) have been proposed and achieved higher and higher translation performance. |
Introduction | However, all of these state-of-the-art translation models rely on the parallel corpora to induce translation rules and estimate the corresponding parameters. |
Introduction | Finally, they used the learned translation model directly to translate unseen data (Ravi and Knight, 2011; Nuhn et al., 2012) or incorporated the learned bilingual lexicon as a new in-domain translation resource into the phrase-based model which is trained with out-of-domain data to improve the domain adaptation performance in machine translation (Dou and Knight, 2012). |
Introduction | We first replace inflected forms by their stems or lemmas: building a translation system on a stemmed representation of the target side leads to a simpler translation task, and the morphological information contained in the source and target language parts of the translation model is more balanced. |
Translation pipeline | We use this representation to create the stemmed representation for training the translation model . |
Translation pipeline | In addition to the translation model , the target-side language model, as well as the reference data for parameter tuning use this representation. |
Translation pipeline | 3.2 Building a stemmed translation model |
Abstract | We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations. |
Discussion and Future Directions | We have presented a novel, incremental topic-based translation model adaptation approach that obeys the causality constraint imposed by spoken conversations. |
Incremental Topic-Based Adaptation | Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality. |
Incremental Topic-Based Adaptation | We add this feature to the log-linear translation model with its own weight, which is tuned with MERT. |
Abstract | Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora. |
Experiments | In the end-to-end MT pipeline we use a standard set of features: relative-frequency and lexical translation model probabilities in both directions; distance-based distortion model; language model and word count. |
Introduction | Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data, from |
Introduction | This paper develops a phrase-based translation model which aims to address the above shortcomings of the phrase-based translation pipeline. |
Experiments | The translation model (TM) was smoothed in both directions with KN smoothing (Chen et al., 2011). |
Experiments | In (Foster and Kuhn, 2007), two kinds of linear mixture were described: linear mixture of language models (LMs), and linear mixture of translation models (TMs). |
Introduction | The translation models of a statistical machine translation (SMT) system are trained on parallel data. |
Introduction | The 2012 JHU workshop on Domain Adaptation for MT 1 proposed phrase sense disambiguation (PSD) for translation model adaptation. |
Introduction | The decipherment approach for MT has recently gained popularity for training and adapting translation models using only monolingual data. |
Introduction | The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language. |
Related Work | (Ravi and Knight, 2011), (Nuhn et al., 2012), and (Dou and Knight, 2012) treat natural language translation as a deciphering problem including phenomena like reordering, insertion, and deletion and are able to train translation models using only monolingual data. |
Name-aware MT | Some of these names carry special meanings that may influence translations of the neighboring words, and thus replacing them with non-terminals can lead to information loss and weaken the translation model . |
Name-aware MT | Both sentence pairs are kept in the combined data to build the translation model . |
Related Work | More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names. |