Abstract | Our proposed method employs statistical machine translation to improve question retrieval and enriches the question representation with the translated words from other languages via matrix factorization. |
Experiments | Row 8 and row 9 are our proposed method, which leverages statistical machine translation to improve question retrieval via matrix factorization. |
Introduction | The idea of improving question retrieval with statistical machine translation is based on the following two observa- |
Introduction | However, there are two problems with this enrichment: (1) enriching the original questions with the translated words from other languages increases the dimensionality and makes the question representation even more sparse; (2) statistical machine translation may introduce noise, which can harm the performance of question retrieval. |
Introduction | To solve these two problems, we propose to leverage statistical machine translation to improve question retrieval via matrix factorization. |
Our Approach | This paper aims to leverage statistical machine translation to enrich the question representation. |
Our Approach | Statistical machine translation (e.g., Google Translate) can utilize contextual information during the question translation, so it can solve the word ambiguity and word mismatch problems to some extent. |
Our Approach | However, there are two problems with this enrichment: (1) enriching the original questions with the translated words from other languages makes the question representation even more sparse; (2) statistical machine translation may introduce noise.5 To solve these two problems, we propose to leverage statistical machine translation to improve question retrieval via matrix factorization. |
Introduction | The systematic word order difference between two languages poses a challenge for current statistical machine translation (SMT) systems. |
Introduction | :15 for Statistical Machine Translation |
Introduction | The remainder of this paper is organized as follows: Section 2 introduces the basement of this research: the principle of statistical machine translation . |
Translation System Overview | In statistical machine translation , we are given a source language sentence fi] = fl... fj . |
Abstract | Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. |
Introduction | During the last decade, statistical machine translation has made great progress. |
Introduction | Recently, more and more researchers concentrated on taking full advantage of the monolingual corpora in both source and target languages, and proposed methods for bilingual lexicon induction from nonparallel data (Rapp, 1995, 1999; Koehn and Knight, 2002; Haghighi et al., 2008; Daume III and J agarlamudi, 2011) and proposed unsupervised statistical machine translation (bilingual lexicon is a byproduct) with only monolingual corpora (Ravi and Knight, 2011; Nuhn et al., 2012; Dou and Knight, 2012). |
Introduction | The unsupervised statistical machine translation method (Ravi and Knight, 2011; Nuhn et al., 2012; Dou and Knight, 2012) viewed the translation task as a decipherment problem and designed a generative model with the objective function to maximize the likelihood of the source language monolingual data. |
Abstract | However, these solutions are impractical in complex structured prediction problems such as statistical machine translation . |
Conclusions and Future Work | Finally, although motivated by statistical machine translation , RM is a gradient-based method that can easily be applied to other problems. |
Introduction | The desire to incorporate high-dimensional sparse feature representations into statistical machine translation (SMT) models has driven recent research away from Minimum Error Rate Training (MERT) (Och, 2003), and toward other discriminative methods that can optimize more features. |
Abstract | In this paper, we propose a new Bayesian inference method to train statistical machine translation systems using only nonparallel corpora. |
Introduction | Statistical machine translation (SMT) systems these days are built using large amounts of bilingual parallel corpora. |
Introduction | Recently, this topic has been receiving increasing attention from researchers and new methods have been proposed to train statistical machine translation models using only monolingual data in the source and target language. |