Abstract | Our proposed method employs statistical machine translation to improve question retrieval and enriches the question representation with the translated words from other languages via matrix factorization. |
Introduction | The idea of improving question retrieval with statistical machine translation is based on the following two observa- |
Introduction | However, there are two problems with this enrichment: (1) enriching the original questions with the translated words from other languages increases the dimensionality and makes the question representation even more sparse; (2) statistical machine translation may introduce noise, which can harm the performance of question retrieval. |
Introduction | To solve these two problems, we propose to leverage statistical machine translation to improve question retrieval via matrix factorization. |
Our Approach | This paper aims to leverage statistical machine translation to enrich the question representation. |
Our Approach | Statistical machine translation (e.g., Google Translate) can utilize contextual information during the question translation, so it can solve the word ambiguity and word mismatch problems to some extent. |
Our Approach | However, there are two problems with this enrichment: (1) enriching the original questions with the translated words from other languages makes the question representation even more sparse; (2) statistical machine translation may introduce noise.5 To solve these two problems, we propose to leverage statistical machine translation to improve question retrieval via matrix factorization. |
Abstract | ECs are ubiquitous in languages like Chinese, but they are tacitly ignored in most machine translation (MT) work because of their elusive nature. |
Abstract | In this paper we present a comprehensive treatment of ECs by first recovering them with a structured MaxEnt model with a rich set of syntactic and lexical features, and then incorporating the predicted ECs into a Chinese-to-English machine translation task through multiple approaches, including the extraction of EC-specific sparse features. |
Chinese Empty Category Prediction | our opinion, recovering ECs from machine parse trees is more meaningful since that is what one would encounter when developing a downstream application such as machine translation . |
Conclusions and Future Work | We also applied the predicted ECs to a large-scale Chinese-to-English machine translation task and achieved significant improvement over two strong MT base- |
Integrating Empty Categories in Machine Translation | In this section, we explore multiple approaches of utilizing recovered ECs in machine translation . |
Introduction | One of the key challenges in statistical machine translation (SMT) is to effectively model inherent differences between the source and the target language. |
Introduction | ty Categories for Machine Translation |
Introduction | 0 Measure the effect of ECs on automatic word alignment for machine translation after integrating recovered ECs into the MT data; |
Related Work | There exists only a handful of previous work on applying ECs explicitly to machine translation so far. |
Related Work | First, in addition to the preprocessing of training data and inserting recovered empty categories, we implement sparse features to further boost the performance, and tune the feature weights directly towards maximizing the machine translation metric. |
Abstract | Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. |
Abstract | Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. |
Abstract | In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation ) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. |
Experimental setup | Additionally, we evaluate the effect of reordering on our final systems for machine translation measured using BLEU. |
Experimental setup | The parallel corpus is used for building our phrased based machine translation system and to add training data for our reordering model. |
Experimental setup | For our machine translation experiments, we used a standard phrase based system (Al-Onaizan and Papineni, 2006) with a lexicalized distortion model with a window size of +/-4 words5. |
Introduction | Dealing with word order differences between source and target languages presents a significant challenge for machine translation systems. |
Introduction | in machine translation output that is not fluent and is often very hard to understand. |
Introduction | This results in a 1.8 BLEU point gain in machine translation performance on an Urdu-English machine translation task over a preordering model trained using only manual word alignments. |
Results and Discussions | We see a significant gain of 1.8 BLEU points in machine translation by going beyond manual word alignments using the best reordering model reported in Table 3. |
Abstract | In this paper, we propose a new Bayesian inference method to train statistical machine translation systems using only nonparallel corpora. |
Bayesian MT Decipherment via Hash Sampling | We use a similar technique as theirs but a different approximate distribution for the proposal, one that is better-suited for machine translation models and without some of the additional overhead required for computing certain terms in the original formulation. |
Decipherment Model for Machine Translation | We now describe the decipherment problem formulation for machine translation . |
Decipherment Model for Machine Translation | Contrary to standard machine translation training scenarios, here we have to estimate the translation model P9( f |e) parameters using only monolingual data. |
Decipherment Model for Machine Translation | Translation Model: Machine translation is a much more complex task than solving other decipherment tasks such as word substitution ciphers (Ravi and Knight, 2011b; Dou and Knight, 2012). |
Discussion and Future Work | for unsupervised machine translation which can help further improve the performance in addition to accelerating the sampling process. |
Experiments and Results | To evaluate translation quality, we use BLEU score (Papineni et al., 2002), a standard evaluation measure used in machine translation . |
Introduction | Statistical machine translation (SMT) systems these days are built using large amounts of bilingual parallel corpora. |
Introduction | Recently, this topic has been receiving increasing attention from researchers and new methods have been proposed to train statistical machine translation models using only monolingual data in the source and target language. |
Abstract | Similar idea is applied to Cross Lingual Sentiment Analysis (CLSA), and it is shown that reduction in data sparsity (after translation or bilingual-mapping) produces accuracy higher than Machine Translation based CLSA and sense based CLSA. |
Clustering for Cross Lingual Sentiment Analysis | Existing approaches for CLSA depend on an intermediary machine translation system to bridge the language gap (Hiroshi et al., 2004; Banea et al., 2008). |
Clustering for Cross Lingual Sentiment Analysis | Machine translation is very resource intensive. |
Clustering for Cross Lingual Sentiment Analysis | Given that sentiment analysis is a less resource intensive task compared to machine translation , the use of an MT system is hard to justify for performing |
Discussions | This could degrade the accuracy of the machine translation itself, limiting the performance of an MT based CLSA system. |
Introduction | When used as an additional feature with word based language models, it has been shown to improve the system performance viz, machine translation (Uszkoreit and Brants, 2008; Stymne, 2012), speech recognition (Martin et al., 1995; Samuelsson and Reichl, 1999), dependency parsing (Koo et al., 2008; Haffari et al., 2011; Zhang and Nivre, 2011; Tratz and Hovy, 2011) and NER (Miller et al., 2004; Faruqui and Pado, 2010; Turian et al., 2010; Tackstro'm et al., 2012). |
Introduction | Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009). |
Related Work | Most often these methods depend on an intermediary machine translation system (Wan, 2009; Brooke et al., 2009) or a bilingual dictionary (Ghorbel and Jacot, 2011; Lu et al., 2011) to bridge the language gap. |
Abstract | Here we approach it as a straightforward machine translation task, and demonstrate that standard machine translation components can be adapted into a semantic parser. |
Abstract | These results support the use of machine translation methods as an informative baseline in semantic parsing evaluations, and suggest that research in semantic parsing could benefit from advances in machine translation . |
Conclusions | We have presented a semantic parser which uses techniques from machine translation to learn mappings from natural language to variable-free meaning representations. |
Discussion | For this reason, we argue for the use of a machine translation baseline as a point of comparison for new methods. |
Introduction | At least superficially, SP is simply a machine translation (MT) task: we transform an NL utterance in one language into a statement of another (unnatural) meaning representation language (MRL). |
Related Work | tsVB also uses a piece of standard MT machinery, specifically tree transducers, which have been profitably employed for syntax-based machine translation (Maletti, 2010). |
Related Work | The present work is also the first we are aware of which uses phrase-based rather than tree-based machine translation techniques to learn a semantic parser. |
Abstract | Most statistical machine translation (SMT) systems are modeled using a log-linear framework. |
Introduction | However, the neural network based machine translation is far from easy. |
Introduction | Actually, existing works empirically show that some nonlocal features, especially language model, contribute greatly to machine translation . |
Introduction | According to the above analysis, we propose a variant of a neural network model for machine translation , and we call it Additive Neural Networks or AdNN for short. |
Abstract | Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phrase-internal translation and reordering. |
Experiments | We used the Moses machine translation decoder (Koehn et al., 2007), using the default features and decoding settings. |
Experiments | Table 3: Machine translation performance in BLE U % on the IWSLT 2005 Chinese-English test set. |
Introduction | Recent years have witnessed burgeoning development of statistical machine translation research, notably phrase-based (Koehn et al., 2003) and syntax-based approaches (Chiang, 2005; Galley et al., 2006; Liu et al., 2006). |
Model | We consider a process in which the target string is generated using a left-to-right order, similar to the decoding strategy used by phrase-based machine translation systems (Koehn et al., 2003). |
Related Work | Word based models have a long history in machine translation , starting with the venerable IBM translation models (Brown et al., 1993) and the hidden Markov model (Vogel et al., 1996). |
Related Work | More recently, a number of authors have proposed Markov models for machine translation . |
Abstract | Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. |
Introduction | During the last decade, statistical machine translation has made great progress. |
Introduction | Recently, more and more researchers concentrated on taking full advantage of the monolingual corpora in both source and target languages, and proposed methods for bilingual lexicon induction from nonparallel data (Rapp, 1995, 1999; Koehn and Knight, 2002; Haghighi et al., 2008; Daume III and J agarlamudi, 2011) and proposed unsupervised statistical machine translation (bilingual lexicon is a byproduct) with only monolingual corpora (Ravi and Knight, 2011; Nuhn et al., 2012; Dou and Knight, 2012). |
Introduction | The unsupervised statistical machine translation method (Ravi and Knight, 2011; Nuhn et al., 2012; Dou and Knight, 2012) viewed the translation task as a decipherment problem and designed a generative model with the objective function to maximize the likelihood of the source language monolingual data. |
Introduction | The systematic word order difference between two languages poses a challenge for current statistical machine translation (SMT) systems. |
Introduction | :15 for Statistical Machine Translation |
Introduction | The remainder of this paper is organized as follows: Section 2 introduces the basement of this research: the principle of statistical machine translation . |
Translation System Overview | In statistical machine translation , we are given a source language sentence fi] = fl... fj . |
Translation System Overview | In this paper, the phrase-based machine translation system |
Conclusion and Future Work | We demonstrated that our EMBOT-based machine translation system beats a standard tree-to-tree system (Moses tree-to-tree) on the WMT 2009 translation task English —> German. |
Conclusion and Future Work | To achieve this we implemented the formal model as described in Section 2 inside the Moses machine translation toolkit. |
Introduction | Besides phrase-based machine translation systems (Koehn et al., 2003), syntax-based systems have become widely used because of their ability to handle nonlocal reordering. |
Introduction | In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model. |
Theoretical Model | In this section, we present the theoretical generative model used in our approach to syntax-based machine translation . |
Abstract | Parallel text is the fuel that drives modern machine translation systems. |
Abstract | Furthermore, 22% of the true positives are potentially machine translations (judging by the quality), whereas in 13% of the cases one of the sentences contains additional content not ex- |
Abstract | 4 Machine Translation Experiments |
Abstract | Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora. |
Experiments | The time complexity of our inference algorithm is 0(n6), which can be prohibitive for large scale machine translation tasks. |
Introduction | The phrase-based approach (Koehn et al., 2003) to machine translation (MT) has transformed MT from a narrow research topic into a truly useful technology to end users. |
Related Work | In the context of machine translation , ITG has been explored for statistical word alignment in both unsupervised (Zhang and Gildea, 2005; Cherry and Lin, 2007; Zhang et al., 2008; Pauls et al., 2010) and supervised (Haghighi et al., 2009; Cherry and Lin, 2006) settings, and for decoding (Petrov et al., 2008). |
Related Work | As mentioned above, ours is not the first work attempting to generalise adaptor grammars for machine translation ; (Neubig et al., 2011) also developed a similar approach based around ITG using a Pitman-Yor Process prior. |
Abstract | We then use this framework to compare several analytical models on data from the Workshop on Machine Translation (WMT). |
Experiments | 14I.e., machine translation specialists. |
From Rankings to Relative Ability | 6One could argue that it specifies a space of machine translation specialists, but likely these individuals are thought to be a representative sample of a broader community. |
The WMT Translation Competition | Every year, the Workshop on Machine Translation (WMT) conducts a competition between machine translation systems. |
The WMT Translation Competition | For each track, the organizers also assemble a panel of judges, typically machine translation specialists.1 The role of a judge is to repeatedly rank five different translations of the same source text. |
Eliciting Addressee’s Emotion | Following (Ritter et al., 2011), we apply the statistical machine translation model for generating a response to a given utterance. |
Eliciting Addressee’s Emotion | Similar to ordinary machine translation systems, the model is learned from pairs of an utterance and a response by using off-the-shelf tools for machine translation . |
Eliciting Addressee’s Emotion | Unlike machine translation , we do not use reordering models, because the positions of phrases are not considered to correlate strongly with the appropriateness of responses (Ritter et al., 2011). |
Related Work | The linear interpolation of translation and/or language models is a widely-used technique for adapting machine translation systems to new domains (Sennrich, 2012). |
Conclusion | We are currently attempting to construct a parser for UCCA and to apply it to several semantic tasks, notably English-French machine translation . |
Introduction | One example is machine translation to target languages that do not express this structural distinction (e.g., both (a) and (b) would be translated to the same German sentence “John duschte”). |
Related Work | A different strand of work addresses the construction of an interlingual representation, often with a motivation of applying it to machine translation . |
UCCA’s Benefits to Semantic Tasks | Aside from machine translation , a great variety of semantic tasks can benefit from a scheme that is relatively insensitive to syntactic variation. |
Introduction | It is an important processing step for a wide range of Natural Language Processing (NLP) tasks such as text-to-speech synthesis, speech recognition, information extraction, parsing, and machine translation (Sproat et al., 2001). |
Related Work | Research on SMS and Twitter normalization has been roughly categorized as drawing inspiration from three other areas of NLP (Kobus et al., 2008): machine translation , spell checking, and automatic speech recognition. |
Related Work | The statistical machine translation (SMT) metaphor was the first proposed to handle the text normalization problem (Aw et al., 2006). |
Related Work | (2008) undertook a hybrid approach that pulls inspiration from both the machine translation and speech recognition metaphors. |
Abstract | We present a fast and scalable online method for tuning statistical machine translation models with large feature sets. |
Abstract | Equally important is our analysis, which suggests techniques for mitigating overfitting and domain mismatch, and applies to other recent discriminative methods for machine translation . |
Adaptive Online Algorithms | Machine translation is an unusual machine learning setting because multiple correct translations exist and decoding is comparatively expensive. |
Introduction | Adaptation of discriminative learning methods for these types of features to statistical machine translation (MT) systems, which have historically used idiosyncratic learning techniques for a few dense features, has been an active research area for the past half-decade. |
Conclusion | This work was funded by the DFG Research Project Distributional Approaches to Semantic Relatedness (Marion Weller), the DFG Heisenberg Fellowship SCHU-25 80/ 1-1 (Sabine Schulte im Walde), as well as by the Deutsche Forschungsge-meinschaft grant Models of Morphosyntax for Statistical Machine Translation (Alexander Fraser). |
Experiments and evaluation | 9English/German data released for the 2009 ACL Workshop on Machine Translation shared task. |
Previous work | as a hierarchical machine translation system using a string-to-tree setup. |
Previous work | (2012) evaluated user reactions to different error types in machine translation and came to the result that morphological |
Abstract | Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. |
Conclusion and Future Work | Last, some related approaches (Smith and Clark, 2009; Phillips, 2011) combine SMT and example-based machine translation (EBMT) (Nagao, 1984). |
Introduction | Statistical machine translation (SMT), especially the phrase-based model (Koehn et al., 2003), has developed very fast in the last decade. |
Problem Formulation | Compared with the standard phrase-based machine translation model, the translation problem is reformulated as follows (only based on the best TM, however, it is similar for multiple TM sentences): |
Experiment | The bilingual corpus includes 5.1M sentence pairs from the NIST 2008 constrained track of Chinese-to-English machine translation task. |
Experiment | Development data are generated based on the English references of NIST 2008 constrained track of Chinese-to-English machine translation task. |
Introduction | Comparing to these works, our paraphrasing engine alters queries in a similar way to statistical machine translation , with systematic tuning and decoding components. |
Paraphrasing for Web Search | Similar to statistical machine translation (SMT), given an input query Q, our paraphrasing engine generates paraphrase candidates1 based on a linear model. |
Abstract | Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. |
Abstract | Instead of difficult and expensive annotation, we build a gold-standard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation . |
Introduction | When machine translation (MT) systems are applied in a new domain, many errors are a result of: (1) previously unseen (OOV) source language words, or (2) source language words that appear with a new sense and which require new transla- |
Related Work | Work on active learning for machine translation has focused on collecting translations for longer unknown segments (e. g., Bloodgood and Callison-Burch (2010)). |
Conclusions and Future Work | In machine translation , improved language models have resulted in significant improvements in translation performance (Brants et al., 2007). |
Introduction | In some problem domains, such as machine translation , the translation is between two distinct languages and the language model can only be trained on data in the output language. |
Related Work | Adaptation techniques have been shown to improve language modeling performance based on perplexity (Rosenfeld, 1996) and in application areas such as speech transcription (Bacchiani and Roark, 2003) and machine translation (Zhao et al., 2004), though no previous research has examined the lan- |
Related Work | Many recent statistical simplification techniques build upon models from machine translation and utilize a simple language model during simplifica-tiorfldecoding both in English (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012) and in other languages (Specia, 2010). |
Abstract | Out-of-vocabulary (oov) words or phrases still remain a challenge in statistical machine translation especially when a limited amount of parallel text is available for training or when there is a domain shift from training data to test data. |
Collocational Lexicon Induction | This approach has also been used in machine translation to find in-vocabulary paraphrases for oov words on the source side and find a way to translate them. |
Experiments & Results 4.1 Experimental Setup | BLEU (Papineni et al., 2002) is still the de facto evaluation metric for machine translation and we use that to measure the quality of our proposed approaches for MT. |
Introduction | Out-of-vocabulary (oov) words or phrases still remain a challenge in statistical machine translation . |
Abstract | Our experiments on two machine translation quality estimation datasets show uniform significant accuracy gains from multitask learning, and consistently outperform strong baselines. |
Conclusion | Our experiments showed how our approach outperformed competitive baselines on two machine translation quality regression problems, including the highly challenging problem of predicting post-editing time. |
Conclusion | Models of individual annotators could be used to train machine translation systems to optimise an annotator-specific quality measure, or in active learning for corpus annotation, where the model can suggest the most appropriate instances for each annotator or the best annotator for a given instance. |
Introduction | This is the case, for example, of annotations on the quality of sentences generated using machine translation (MT) systems, which are often used to build quality estimation models (Blatz et al., 2004; Specia et al., 2009) — our application of interest. |
Cross Language Text Categorization | Most CLTC methods rely heavily on machine translation (MT). |
Cross Language Text Categorization | (2011) also use machine translation , but enhance the processing through domain adaptation by feature weighing, assuming that the training data in one language and the test data in the other come from different domains, or can exhibit different linguistic phenomena due to linguistic and cultural differences. |
Cross Language Text Categorization | As we have seen in the literature review, machine translation and bilingual dictionaries can be used to cast these dimensions from the source language L5 to the target language Lt. |
Conclusion | We show that a considerable amount of parallel sentence pairs can be crawled from microblogs and these can be used to improve Machine Translation by updating our translation tables with translations of newer terms. |
Experiments | 5.2 Machine Translation Experiments |
Experiments | We report on machine translation experiments using our harvested data in two domains: edited news and microblogs. |
Introduction | Machine translation suffers acutely from the domain-mismatch problem caused by microblog text. |
Experiments | well handled by all machine translation systems 2. |
Introduction | We thus investigated the possibility of using machine translation to create a parallel corpus, as has been done for spoken |
Introduction | The idea is that using machine translation would enable us to have a large training corpus, either by using the English one and translating the test corpus, or by translating the training corpus. |
Introduction | One of the questions posed was whether the quality of present machine translation systems would enable to learn the classification properly. |
Related Work | (Ortiz-Martinez et al., 2010) delay the computation of translation model features for the purpose of interactive machine translation with online training. |
Translation Model Architecture | One applications where this could be desirable is interactive machine translation , where one could work with a mix of compact, static tables, and tables designed to be incrementally trainable. |
Translation Model Architecture | 2 data sets are out-of-domain, made available by the 2012 Workshop on Statistical Machine Translation (Callison-Burch et al., 2012). |
Abstract | However, these solutions are impractical in complex structured prediction problems such as statistical machine translation . |
Conclusions and Future Work | Finally, although motivated by statistical machine translation , RM is a gradient-based method that can easily be applied to other problems. |
Introduction | The desire to incorporate high-dimensional sparse feature representations into statistical machine translation (SMT) models has driven recent research away from Minimum Error Rate Training (MERT) (Och, 2003), and toward other discriminative methods that can optimize more features. |
Introduction | Word segmentation and part-of-speech (POS) tagging are two critical and necessary initial procedures with respect to the majority of high-level Chinese language processing tasks such as syntax parsing, information extraction and machine translation . |
Related Work | (2008) described a Bayesian semi-supervised CWS model by considering the segmentation as the hidden variable in machine translation . |
Related Work | Unlike this model, the proposed approach is targeted at a general model, instead of one oriented to machine translation task. |
Experiment | We used the patent data for the Japanese to English and Chinese to English translation subtasks from the NTCIR-9 Patent Machine Translation Task (Goto et al., 2011). |
Introduction | Estimating appropriate word order in a target language is one of the most difficult problems for statistical machine translation (SMT). |
Introduction | Experiments confirmed the effectiveness of our method for J apanese-English and Chinese-English translation, using NTCIR-9 Patent Machine Translation Task data sets (Goto et al., 2011). |
Abstract | We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding. |
Introduction | A key bottleneck of high-quality cross-lingual information access lies in the performance of Machine Translation (MT). |
Related Work | In contrast, our name pair mining approach described in this paper does not require any machine translation or transliteration features. |