Integration with SMT | We use the following method to integrate our transliterator into the overall SMT system: |
Integration with SMT | In a tuning step, the Minimim Error Rate Training component of our SMT system iteratively adjusts the set of rule weights, including the weight associated with the transliteration feature, such that the English translations are optimized with respect to a set of known reference translations according to the BLEU translation metric. |
Integration with SMT | At runtime, the transliterations then compete with the translations generated by the general SMT system . |
Introduction | The SMT system drops most names in this example. |
Introduction | The simplest way to integrate name handling into SMT is: (1) run a named-entity identification system on the source sentence, (2) transliterate identified entities with a special-purpose transliteration component, and (3) run the SMT system on the source sentence, as usual, but when looking up phrasal translations for the words identified in step 1, instead use the transliterations from step 2. |
Introduction | The base SMT system may translate a commonly-occurring name just fine, due to the bitext it was trained on, while the transliteration component can easily supply a worse answer. |
Abstract | In this paper, we propose a statistical model to generate appropriate measure words of nouns for an English-to-Chinese SMT system . |
Abstract | Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system . |
Introduction | English-Chinese SMT Systems |
Introduction | However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words. |
Introduction | Due to the limited size of bilingual corpora, many measure words, as well as the collocations between a measure and its head word, cannot be well covered by the phrase translation table in an SMT system . |
Our Method | For those having English translations, such as 9K” (meter), “DEE” (ton), we just use the translation produced by the SMT system itself. |
Our Method | The model is applied to SMT system outputs as a postprocessing procedure. |
Our Method | Based on contextual information contained in both input source sentence and SMT system’s output translation, a measure word candidate set M is constructed. |
Introduction | While the research in statistical machine trans-ation (SMT) has made significant progress, most SMT systems (Koehn et al., 2003; Chiang, 2007; 3alley et al., 2006) rely on parallel corpora to extract ,ranslation entries. |
Introduction | The richness and complexness )f Chinese abbreviations imposes challenges to the SMT systems . |
Introduction | In particular, many Chinese abbrevi-1ti0ns may not appear in available parallel corpora, n which case current SMT systems treat them as mknown words and leave them untranslated. |
Unsupervised Translation Induction for Chinese Abbreviations | Moreover, our approach utilizes both Chinese and English monolingual data to help MT, while most SMT systems utilizes only the English monolingual data to build a language model. |
Abstract | Our inflection generation models are trained independently of the SMT system . |
Abstract | We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems. |
Abstract | We applied our inflection generation models in translating English into two morphologically complex languages, Russian and Arabic, and show that our model improves the quality of SMT over both phrasal and syntax-based SMT systems according to BLEU and human judge-ments. |
Conclusion and future work | We have shown that an independent model of morphology generation can be successfully integrated with an SMT system , making improvements in both phrasal and syntax-based MT. |
Introduction | We also demonstrate that our independently trained models are portable, showing that they can improve both syntactic and phrasal SMT systems . |
Related work | In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system. |