Abstract | Automatic word alignment is a key step in training statistical machine translation systems. |
Abstract | Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. |
Conclusions | To our knowledge this is the first extensive evaluation where improvements in alignment accuracy lead to improvements in machine translation performance. |
Introduction | The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair. |
Introduction | Our contribution is a large scale evaluation of this methodology for word alignments, an investigation of how the produced alignments differ and how they can be used to consistently improve machine translation performance (as measured by BLEU score) across many languages on training corpora with up to hundred thousand sentences. |
Introduction | Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding. |
Phrase-based machine translation | In particular we fix a state of the art machine translation system1 and measure its performance when we vary the supplied word alignments. |
Phrase-based machine translation | The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year. |
Phrase-based machine translation | In addition to the Hansards corpus and the Europarl English-Spanish corpus, we used four other corpora for the machine translation experiments. |
Word alignment results | A natural question is how to tune the threshold in order to improve machine translation quality. |
Word alignment results | In the next section we evaluate and compare the effects of the different alignments in a phrase based machine translation system. |
Abstract | We show that combining them with word—based n—gram models in the log—linear model of a state—of—the—art statistical machine translation system leads to improvements in translation quality as indicated by the BLEU score. |
Experiments | We use the distributed training and application infrastructure described in (Brants et al., 2007) with modifications to allow the training of predictive class-based models and their application in the decoder of the machine translation system. |
Experiments | Instead we report BLEU scores (Papineni et al., 2002) of the machine translation system using different combinations of word- and class-based models for translation tasks from English to Arabic and Arabic to English. |
Experiments | A fourth data set, en_’web, was used together with the other three data sets to train the large word-based model used in the second machine translation experiment. |
Introduction | .g for Large Scale Class-Based in Machine Translation |
Introduction | However, in the area of statistical machine translation , especially in the context of large training corpora, fewer experiments with class-based n-gram models have been performed with mixed success (Raab, 2006). |
Introduction | We then show that using partially class-based language models trained using the resulting classifications together with word-based language models in a state-of-the-art statistical machine translation system yields improvements despite the very large size of the word-based models used. |
Abstract | We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework. |
Experimental Setup | 4.3 Machine Translation |
Experiments | 5.3 Machine Translation |
Introduction | Language models (LMs) are a core component in statistical machine translation , speech recognition, optical character recognition and many other areas. |
Introduction | Efficiency is paramount in applications such as machine translation which make huge numbers of LM requests per sentence. |
Introduction | This paper focuses on machine translation . |
Scaling Language Models | In statistical machine translation (SMT), LMs are used to score candidate translations in the target language. |
Abstract | We present a method to transliterate names in the framework of end-to-end statistical machine translation . |
Discussion | We have shown that a state-of-the-art statistical machine translation system can benefit from a dedicated transliteration module to improve the transla- |
Discussion | Improved named entity translation accuracy as measured by the NEWA metric in general, and a reduction in dropped names in particular is clearly valuable to the human reader of machine translated documents as well as for systems using machine translation for further information processing. |
End-to-End results | Finally, here are end-to-end machine translation results for three sentences, with and without the transliteration module, along with a human reference translation. |
Introduction | State-of-the-art statistical machine translation (SMT) is bad at translating names that are not very common, particularly across languages with different character sets and sound systems. |
Introduction | This evaluation involves a mixture of entity identification and translation concems—for example, the scoring system asks for coreference determination, which may or may not be of interest for improving machine translation output. |
Abstract | Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words. |
Abstract | Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system. |
Experiments | We also compared our method with a well-known rule-based machine translation system —SYSTRAN3. |
Introduction | According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure |
Introduction | Therefore, in the English-to-Chinese machine translation task we need to take additional efforts to generate the missing measure words in Chinese. |
Introduction | In most statistical machine translation (SMT) models (Och et al., 2004; Koehn et al., 2003; Chiang, 2005), some of measure words can be generated without modification or additional processing. |
Abstract | Large-scale discriminative machine translation promises to further the state-of-the-art, but has failed to deliver convincing gains over current heuristic frequency count systems. |
Challenges for Discriminative SMT | These results could — and should — be applied to other models, discriminative and generative, phrase- and syntax-based, to further progress the state-of-the-art in machine translation . |
Discussion and Further Work | Finally, while in this paper we have focussed on the science of discriminative machine translation , we believe that with suitable engineering this model will advance the state-of-the-art. |
Evaluation | The development and test data was taken from the 2006 NAACL and 2007 ACL workshops on machine translation , also filtered for sentence length.4 Tuning of the regularisation parameter and MERT training of the benchmark models was performed on dev2006, while the test set was the concatenation of devtest2006, test2006 and test2007, amounting to 315 development and 1164 test sentences. |
Introduction | Statistical machine translation (SMT) has seen a resurgence in popularity in recent years, with progress being driven by a move to phrase-based and syntax-inspired approaches. |
Abstract | Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated. |
Conclusions | We integrate our method into a state-of-the-art phrase-based baseline translation system, i.e., Moses (Koehn et al., 2007), and show that the integrated system consistently improves the performance of the baseline system on various NIST machine translation test sets. |
Related Work | Though automatically extracting the relations between full-form Chinese phrases and their abbreviations is an interesting and important task for many natural language processing applications (e.g., machine translation , question answering, information retrieval, and so on), not much work is available in the literature. |
Related Work | None of the above work has addressed the Chinese abbreviation issue in the context of a machine translation task, which is the primary goal in this paper. |
Related Work | To the best of our knowledge, our work is the first to systematically model Chinese abbreviation expansion to improve machine translation . |
Factored Model | The factored statistical machine translation model uses a log-linear approach, in order to combine the several components, including the language model, the reordering model, the translation models and the generation models. |
Introduction | Traditional statistical machine translation methods are based on mapping on the lexical level, which takes place in a local window of a few words. |
Introduction | Our method is based on factored phrase-based statistical machine translation models. |
Introduction | Traditional statistical machine translation models deal with this problems in two ways: |
Abstract | Synchronous Tree—Adjoining Grammar (STAG) is a promising formalism for syntax—aware machine translation and simultaneous computation of natural-language syntax and semantics. |
Introduction | Recently, the desire to incorporate syntax-awareness into machine translation systems has generated interest in |
Introduction | Without efficient algorithms for processing it, its potential for use in machine translation and TAG semantics systems is limited. |
Synchronous Tree-Adjoining Grammar | In order for STAG to be used in machine translation and other natural-language processing tasks it must be possible to process it efficiently. |
Abstract | In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation . |
Conclusions and Future Work | In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation . |
Introduction | In recent years, hierarchical methods have been successfully applied to Statistical Machine Translation (Graehl and Knight, 2004; Chiang, 2005; Ding and Palmer, 2005; Quirk et al., 2005). |
Introduction | 1.1 Hierarchical Machine Translation |
Abstract | We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. |
Introduction | One of the outstanding problems for further improving machine translation (MT) systems is the difficulty of dividing the MT problem into sub-problems and tackling each subproblem in isolation to improve the overall quality of MT. |
Introduction | This paper describes a successful attempt to integrate a subcomponent for generating word inflections into a statistical machine translation (SMT) |
Machine translation systems and data | We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems. |
Introduction | Statistical machine translation (SMT) is complicated by the fact that words can move during translation. |
Introduction | 1We use the term “syntactic cohesion” throughout this paper to mean what has previously been referred to as “phrasal cohesion”, because the nonlinguistic sense of “phrase” has become so common in machine translation literature. |
Introduction | Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation . |
Conclusion | We believe this general framework could also be applied to other problems involving forests or lattices, such as sequence labeling and machine translation . |
Forest Reranking | For nonlocal features, we adapt cube pruning from forest rescoring (Chiang, 2007; Huang and Chiang, 2007), since the situation here is analogous to machine translation decoding with integrated language models: we can View the scores of unit nonlocal features as the language model cost, computed on-the-fly when combining sub-constituents. |
Introduction | Discriminative reranking has become a popular technique for many NLP problems, in particular, parsing (Collins, 2000) and machine translation (Shen et al., 2005). |
Experiments | These experiments also indicate that a very sparse prior is needed for machine translation tasks. |
Introduction | Most state-of—the-art statistical machine translation systems are based on large phrase tables extracted from parallel text using word-level alignments. |
Introduction | While this approach has been very successful, poor word-level alignments are nonetheless a common source of error in machine translation systems. |
Abstract | We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. |
Introduction | Statistical machine translation systems based on synchronous grammars have recently shown great promise, but one stumbling block to their widespread adoption is that the decoding, or search, problem during translation is more computationally demanding than in phrase-based systems. |
Introduction | We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition. |