Abstract | This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing. |
Introduction | Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model, which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al., |
Introduction | Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models. |
Related Work | Recent work has shown that parsing-based machine translation using syntax-augmented (Zoll-mann and Venugopal, 2006) hierarchical translation grammars with rich nonterminal sets can demonstrate substantial gains over hierarchical grammars for certain language pairs (Baker et al., 2009). |
Related Work | speech recognition and statistical machine translation focus on the use of n-grams, which provide a simple finite-state model approximation of the target language. |
Related Work | priate algorithmic fit for incorporating syntax into phrase-based statistical machine translation , since both process sentences in an incremental left-to-right fashion. |
Abstract | A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. |
Discussions and Future Work | In addition to paraphrasing, our data collection framework could also be used to produces useful data for machine translation and computer vision. |
Introduction | Machine paraphrasing has many applications for natural language processing tasks, including machine translation (MT), MT evaluation, summary evaluation, question answering, and natural language generation. |
Introduction | Despite the similarities between paraphrasing and translation, several maj or differences have prevented researchers from simply following standards that have been established for machine translation . |
Introduction | Professional translators produce large volumes of bilingual data according to a more or less consistent specification, indirectly fueling work on machine translation algorithms. |
Abstract | Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. |
Conclusion | We also look forward to discovering the best way to take advantage of these new alignments in downstream applications like machine translation , supervised word alignment, bilingual parsing (Burkett et al., 2010), part-of-speech tag induction (Naseem et al., 2009), or cross-lingual model projection (Smith and Eisner, 2009; Das and Petrov, 2011). |
Experimental Results | Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006). |
Experimental Results | Finally, we evaluated our bidirectional model in a large-scale end-to-end phrase-based machine translation system from Chinese to English, based on the alignment template approach (Och and Ney, 2004). |
Introduction | Machine translation systems typically combine the predictions of two directional models, one which aligns f to e and the other e to f (Och et al., 1999). |
A Joint Model with Unlabeled Parallel Text | sentences in one language with their corresponding automatic translations), we use an automatic machine translation system (e.g. |
Abstract | Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines. |
Conclusion | Moreover, the proposed approach continues to produce (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines. |
Related Work | (2008; 2010) instead automatically translate the English resources using automatic machine translation engines for subjectivity classification. |
Results and Analysis | As discussed in Section 3.4, we generate pseudo-parallel data by translating the monolingual sentences in each setting using Google’s machine translation system. |
Abstract | This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. |
Conclusion | Machine translation systems using phrase tables learned directly by the proposed model were able to achieve accuracy competitive with the traditional pipeline of word alignment and heuristic phrase extraction, the first such result for an unsupervised model. |
Experimental Evaluation | The data for French, German, and Spanish are from the 2010 Workshop on Statistical Machine Translation (Callison-Burch et al., 2010). |
Introduction | The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs. |
Introduction | Using this model, we perform machine translation experiments over four language pairs. |
Abstract | The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system. |
Experimental results | We have applied our composite 5-gram/2-SLM+2-gram/4-SLM+5-gramfl9LSA language model that is trained by 1.3 billion word corpus for the task of re-ranking the N -best list in statistical machine translation . |
Experimental results | Chiang (2007) studied the performance of machine translation on Hiero, the BLEU score is 33.31% when n-gram is used to re-rank the N -best list, however, the BLEU score becomes significantly higher 37.09% when the n-gram is embedded directly into Hiero’s one pass decoder, this is because there is not much diversity in the N -best list. |
Introduction | The Markov chain (n-gram) source models, which predict each word on the basis of previous n-l words, have been the workhorses of state-of—the-art speech recognizers and machine translators that help to resolve acoustic or foreign language ambiguities by placing higher probability on more likely original underlying word strings. |
Introduction | As the machine translation (MT) working groups stated on page 3 of their final report (Lavie et al., 2006), “These approaches have resulted in small improvements in MT quality, but have not fundamentally solved the problem. |
Conclusion and discussion | In this work we proposed methods of labeling phrase pairs to create automatically learned PSCFG rules for machine translation . |
Introduction | The Probabilistic Synchronous Context Free Grammar (PSCFG) formalism suggests an intuitive approach to model the long-distance and lexically sensitive reordering phenomena that often occur across language pairs considered for statistical machine translation . |
Introduction | SCFG Rules for Machine Translation |
Introduction | Towards the ultimate goal of building end-to-end machine translation systems without any human annotations, we also experiment with automatically inferred word classes using distributional clustering (Kneser and Ney, 1993). |
Related work | (2006) present a reordering model for machine translation , and make use of clustered phrase pairs to cope with data sparseness in the model. |
Abstract | This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). |
Conclusion and Future Work | In order to help with replication of the results in this paper, we have run the various morphological analysis steps and created the necessary training, tuning and test data files needed in order to train, tune and test any phrase-based machine translation system with our data. |
Conclusion and Future Work | We would particularly like to thank the developers of the open-source Moses machine translation toolkit and the Omorfi morphological analyzer for Finnish which we used for our experiments. |
Translation and Morphology | Languages with rich morphological systems present significant hurdles for statistical machine translation (SMT), most notably data sparsity, source-target asymmetry, and problems with automatic evaluation. |
Introduction | A (formal) translation model is at the core of every machine translation system. |
Introduction | Contrary, in the field of syntax-based machine translation , the translation models have full access to the syntax of the sentences and can base their decision on it. |
Introduction | In this contribution, we restrict MBOT to a form that is particularly relevant in machine translation . |
The model | (2008) argue that STSG have sufficient expressive power for syntax-based machine translation , but Zhang et al. |
Abstract | In this work, we tackle the task of machine translation (MT) without parallel training data. |
Introduction | Bilingual corpora are a staple of statistical machine translation (SMT) research. |
Machine Translation as a Decipherment Task | From a decipherment perspective, machine translation is a much more complex task than word substitution decipherment and poses several technical challenges: (1) scalability due to large corpora sizes and huge translation tables, (2) nondeterminism in translation mappings (a word can have multiple translations), (3) reordering of words |
Word Substitution Decipherment | Before we tackle machine translation without parallel data, we first solve a simpler problem—word substitution decipherment. |
Concluding remarks | Grammar factorization for synchronous models is an important component of current machine translation systems (Zhang et al., 2006), and algorithms for factorization have been studied by Gildea et al. |
Concluding remarks | These algorithms do not result in what we refer as head-driven strategies, although, as machine translation systems improve, lexicalized rules may become important in this setting as well. |
Introduction | Similar questions have arisen in the context of machine translation , as the SCFGs used to model translation are also instances of LCFRSs, as already mentioned. |
Abstract | As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. |
Abstract | We argue that BLEU (Papineni et al., 2002) and other automatic n- gram based MT evaluation metrics do not adequately capture the similarity in meaning between the machine translation and the reference translation—which, ultimately, is essential for MT output to be useful. |
Abstract | the most essential semantic information being captured by machine translation systems? |
Discussion | Prefix probabilities and right prefix probabilities for PSCFGs can be exploited to compute probability distributions for the next word or part-of-speech in left-to-right incremental translation of speech, or alternatively as a predictive tool in applications of interactive machine translation , of the kind described by Foster et al. |
Introduction | Within the area of statistical machine translation , there has been a growing interest in so-called syntax-based translation models, that is, models that define mappings between languages through hierarchical sentence structures. |
Prefix probabilities | One should add that, in real world machine translation applications, it has been observed that recognition (and computation of inside probabilities) for SCFGs can typically be carried out in low-degree polynomial time, and the worst cases mentioned above are not observed with real data. |
Elementary Trees to String Grammar | 3 significantly enriches reordering powers for syntax-based machine translation . |
Introduction | Most syntax-based machine translation models with synchronous context free grammar (SCFG) have been relying on the off—the—shelf monolingual parse structures to learn the translation equivalences for string-to-tree, tree—to—string or tree—to—tree grammars. |
Introduction | However, state—of-the-art monolingual parsers are not necessarily well suited for machine translation in terms of both labels and chunks/brackets. |