Applying Morphology Generation Models to Machine Translation
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim

Article Structure

Abstract

We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages.

Introduction

One of the outstanding problems for further improving machine translation (MT) systems is the difficulty of dividing the MT problem into sub-problems and tackling each subproblem in isolation to improve the overall quality of MT.

Related work

There has been active research on incorporating morphological knowledge in SMT.

Inflection prediction models

This section describes the task and our model for inflection prediction, following (Minkov et al., 2007).

Machine translation systems and data

We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems.

Integration of inflection models with MT systems

We describe three main methods of integration we have considered.

MT performance results

Before delving into the results for each method, we discuss our evaluation measures.

Conclusion and future work

We have shown that an independent model of morphology generation can be successfully integrated with an SMT system, making improvements in both phrasal and syntax-based MT.

Topics

MT system

Appears in 41 sentences as: MT system (35) MT systems (7) MT systems’ (1)
In Applying Morphology Generation Models to Machine Translation
  1. We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems.
    Page 1, “Abstract”
  2. (Finkel et al., 2006), and in some cases, to factor the translation problem so that the baseline MT system can take advantage of the reduction in sparsity by being able to work on word stems.
    Page 1, “Introduction”
  3. In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system.
    Page 2, “Related work”
  4. This also makes the model portable and applicable to different types of MT systems .
    Page 2, “Related work”
  5. In contrast, we focus on methods of integration of an inflection prediction model with an MT system , and on evaluation of the model’s impact on translation.
    Page 2, “Related work”
  6. The dependency structure on the Russian side, indicated by solid arcs, is given by a treelet MT system (see Section 4.1), projected from the word dependency struc-
    Page 3, “Inflection prediction models”
  7. This is a syntactically-informed MT system , designed following (Quirk et al., 2005).
    Page 4, “Machine translation systems and data”
  8. For each language pair, we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
    Page 4, “Machine translation systems and data”
  9. All MT systems for a given language pair used the same datasets.
    Page 4, “Machine translation systems and data”
  10. The methods differ in the extent to which the factoring of the problem into two subprob-lems — predicting stems and predicting inflections — is reflected in the base MT systems .
    Page 4, “Integration of inflection models with MT systems”
  11. In the first method, the MT system is trained to produce fully inflected target words and the inflection model can change the inflections.
    Page 4, “Integration of inflection models with MT systems”

See all papers in Proc. ACL 2008 that mention MT system.

See all papers in Proc. ACL that mention MT system.

Back to top.

BLEU

Appears in 26 sentences as: BLEU (32)
In Applying Morphology Generation Models to Machine Translation
  1. We applied our inflection generation models in translating English into two morphologically complex languages, Russian and Arabic, and show that our model improves the quality of SMT over both phrasal and syntax-based SMT systems according to BLEU and human judge-ments.
    Page 1, “Abstract”
  2. We performed a grid search on the values of A and n, to maximize the BLEU score of the final system on a development set (dev) of 1000 sentences (Table 2).
    Page 5, “Integration of inflection models with MT systems”
  3. For automatically measuring performance, we used 4-gram BLEU against a single reference translation.
    Page 6, “MT performance results”
  4. We also report oracle BLEU scores which incorporate two kinds of oracle knowledge.
    Page 6, “MT performance results”
  5. For the methods using n=l translation from a base MT system, the oracle BLEU score is the BLEU score of the stemmed translation compared to the stemmed reference, which represents the upper bound achievable by changing only the inflected forms (but not stems) of the words in a translation.
    Page 6, “MT performance results”
  6. Model BLEU Oracle BLEU Base MT (71:1) 29.24 -
    Page 7, “MT performance results”
  7. Table 3: Test set performance for English—to—Russian MT ( BLEU ) results by model using a treelet MT system.
    Page 7, “MT performance results”
  8. We can see that Method 1 results in a good improvement of 1.2 BLEU points, even when using only the best (n = 1) translation from the baseline.
    Page 7, “MT performance results”
  9. The oracle improvement achievable by predicting inflections is quite substantial: more than 7 BLEU points.
    Page 7, “MT performance results”
  10. Propagating the uncertainty of the baseline system by using more input hypotheses consistently improves performance across the different methods, with an additional improvement of between .2 and .4 BLEU points.
    Page 7, “MT performance results”
  11. Both the oracle BLEU of the first hypothesis and the achieved performance of the model improved; the best performance achieved by Method 2 is .63 points higher than the performance of Method 1.
    Page 7, “MT performance results”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

word alignment

Appears in 13 sentences as: word alignment (14) word alignments (1)
In Applying Morphology Generation Models to Machine Translation
  1. Evidence for this difficulty is the fact that there has been very little work investigating the use of such independent sub-components, though we started to see some successful cases in the literature, for example in word alignment (Fraser and Marcu, 2007), target language capitalization (Wang et al., 2006) and case marker generation (Toutanova and Suzuki, 2007).
    Page 1, “Introduction”
  2. ture of English and word alignment information.
    Page 3, “Inflection prediction models”
  3. It uses the same lexicalized-HMM model for word alignment as the treelet system, and uses the standard extraction heuristics to extract phrase pairs using forward and backward alignments.
    Page 4, “Machine translation systems and data”
  4. Stemming the target sentences is expected to be helpful for word alignment, especially when the stemming operation is defined so that the word alignment becomes more one-to-one (Goldwater and McClosky, 2005).
    Page 5, “Integration of inflection models with MT systems”
  5. However, for some language pairs, stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target.
    Page 5, “Integration of inflection models with MT systems”
  6. Note that it may be better to use the word alignment maintained as part of the translation hypotheses during search, but our solution is more suitable to situations where these can not be easily obtained.
    Page 6, “Integration of inflection models with MT systems”
  7. In Method 2, word alignment is performed using fully inflected target language sentences.
    Page 6, “Integration of inflection models with MT systems”
  8. In addition to this word-aligned corpus the MT systems use another product of word alignment : the IBM model 1 translation tables.
    Page 6, “Integration of inflection models with MT systems”
  9. In this method, stemming can impact word alignment in addition to the translation models.
    Page 6, “Integration of inflection models with MT systems”
  10. Finally, we can see that using stemming at the word alignment stage further improved both the oracle and the achieved results.
    Page 7, “MT performance results”
  11. pressed in English at the word level, these results are consistent with previous results using stemming to improve word alignment .
    Page 7, “MT performance results”

See all papers in Proc. ACL 2008 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

BLEU score

Appears in 10 sentences as: BLEU score (9) BLEU scores (2)
In Applying Morphology Generation Models to Machine Translation
  1. We performed a grid search on the values of A and n, to maximize the BLEU score of the final system on a development set (dev) of 1000 sentences (Table 2).
    Page 5, “Integration of inflection models with MT systems”
  2. We also report oracle BLEU scores which incorporate two kinds of oracle knowledge.
    Page 6, “MT performance results”
  3. For the methods using n=l translation from a base MT system, the oracle BLEU score is the BLEU score of the stemmed translation compared to the stemmed reference, which represents the upper bound achievable by changing only the inflected forms (but not stems) of the words in a translation.
    Page 6, “MT performance results”
  4. This system achieves a substantially better BLEU score (by 6.76) than the treelet system.
    Page 7, “MT performance results”
  5. The oracle BLEU score achievable by Method 1 using n=l translation, though, is still 6.3 BLEU point higher than the achieved BLEU.
    Page 7, “MT performance results”
  6. The Arabic system also improves with the use of our mode: the best system (Method 1, n=2) achieves the BLEU score of 37.41, a 1.87 point improvement over the baseline.
    Page 8, “MT performance results”
  7. Unlike the case of Russian, Method 2 and 3 do not achieve better results than Method 1, though the oracle BLEU score improves in these models (54.74 and 54.90 as opposed to 52.21 of Method 1).
    Page 8, “MT performance results”
  8. In this section we briefly report the results of human evaluation on the output of our inflection prediction system, as the correlation between BLEU scores and human evaluation results is not always obvious.
    Page 8, “MT performance results”
  9. Table 6 shows the results of the averaged, aggregated score across two judges per evaluation scenario, along with the BLEU score improvements achieved by applying our model.
    Page 8, “MT performance results”
  10. 4 We also note that in Russian, the human evaluation scores are similar for Method 1 and 3 (0.255 and 0.26), though the BLEU score gains are quite different (1.2 vs 2.6).
    Page 8, “MT performance results”

See all papers in Proc. ACL 2008 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

language model

Appears in 8 sentences as: language model (8)
In Applying Morphology Generation Models to Machine Translation
  1. (Goldwater and McClosky, 2005), while the application of a target language model has almost solely been responsible for addressing the second aspect.
    Page 1, “Introduction”
  2. We stemmed the reference translations, predicted the inflection for each stem, and measured the accuracy of prediction, using a set of sentences that were not part of the training data (1K sentences were used for Arabic and 5K for Russian).2 Our model performs significantly better than both the random and trigram language model baselines, and achieves an accuracy of over 91%, which suggests that the model is effective when its input is clean in its stem choice and order.
    Page 3, “Inflection prediction models”
  3. (2003), a trigram target language model , two order models, word count, phrase count, and average phrase size functions.
    Page 4, “Machine translation systems and data”
  4. The features include log-probabilities according to inverted and direct channel models estimated by relative frequency, lexical weighting channel models, a trigram target language model , distortion, word count and phrase count.
    Page 4, “Machine translation systems and data”
  5. For each language pair, we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model ), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
    Page 4, “Machine translation systems and data”
  6. Given such a list of candidate stem sequences, the base MT model together with the inflection model and a language model choose a translation Y* as follows:
    Page 5, “Integration of inflection models with MT systems”
  7. PLM) is the joint probability of the sequence of inflected words according to a trigram language model (LM).
    Page 5, “Integration of inflection models with MT systems”
  8. In addition, stemming the target sentences reduces the sparsity in the translation tables and language model , and is likely to impact positively the performance of an MT system in terms of its ability to recover correct sequences of stems in the target.
    Page 5, “Integration of inflection models with MT systems”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

SMT system

Appears in 6 sentences as: SMT system (4) SMT systems (2)
In Applying Morphology Generation Models to Machine Translation
  1. Our inflection generation models are trained independently of the SMT system .
    Page 1, “Abstract”
  2. We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems.
    Page 1, “Abstract”
  3. We applied our inflection generation models in translating English into two morphologically complex languages, Russian and Arabic, and show that our model improves the quality of SMT over both phrasal and syntax-based SMT systems according to BLEU and human judge-ments.
    Page 1, “Abstract”
  4. We also demonstrate that our independently trained models are portable, showing that they can improve both syntactic and phrasal SMT systems .
    Page 1, “Introduction”
  5. In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system.
    Page 2, “Related work”
  6. We have shown that an independent model of morphology generation can be successfully integrated with an SMT system , making improvements in both phrasal and syntax-based MT.
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2008 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

translation system

Appears in 6 sentences as: translation system (4) translation systems (2)
In Applying Morphology Generation Models to Machine Translation
  1. This second problem is very difficult to address with word-based translation systems , when the relevant morphological information in the target language is either nonexistent or implicitly encoded in the source language.
    Page 1, “Introduction”
  2. Other work closely related to ours is (Toutanova and Suzuki, 2007), which uses an independently trained case marker prediction model in an English-Japanese translation system , but it focuses on the problem of generating a small set of closed class words rather
    Page 2, “Related work”
  3. We integrated the inflection prediction model with two types of machine translation systems : systems that make use of syntax and surface phrase-based systems.
    Page 4, “Machine translation systems and data”
  4. 4.1 Treelet translation system
    Page 4, “Machine translation systems and data”
  5. 4.2 Phrasal translation system
    Page 4, “Machine translation systems and data”
  6. This is a re-implementation of the Pharaoh translation system (Koehn, 2004).
    Page 4, “Machine translation systems and data”

See all papers in Proc. ACL 2008 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

BLEU points

Appears in 5 sentences as: BLEU point (1) BLEU points (4)
In Applying Morphology Generation Models to Machine Translation
  1. We can see that Method 1 results in a good improvement of 1.2 BLEU points , even when using only the best (n = 1) translation from the baseline.
    Page 7, “MT performance results”
  2. The oracle improvement achievable by predicting inflections is quite substantial: more than 7 BLEU points .
    Page 7, “MT performance results”
  3. Propagating the uncertainty of the baseline system by using more input hypotheses consistently improves performance across the different methods, with an additional improvement of between .2 and .4 BLEU points .
    Page 7, “MT performance results”
  4. The performance of the best model is 2.56 BLEU points better than the baseline.
    Page 7, “MT performance results”
  5. The oracle BLEU score achievable by Method 1 using n=l translation, though, is still 6.3 BLEU point higher than the achieved BLEU.
    Page 7, “MT performance results”

See all papers in Proc. ACL 2008 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.

LM

Appears in 5 sentences as: LM (8)
In Applying Morphology Generation Models to Machine Translation
  1. LM 81.0 69.4 Model 91.6 91.0 Avg | I | 13.9 24.1
    Page 4, “Inflection prediction models”
  2. PLM) is the joint probability of the sequence of inflected words according to a trigram language model ( LM ).
    Page 5, “Integration of inflection models with MT systems”
  3. The LM used for the integration is the same LM used in the base MT system that is trained on fully inflected word forms (the base MT system trained on stems uses an LM trained on a stem sequence).
    Page 5, “Integration of inflection models with MT systems”
  4. Equation (1) shows that the model first selects the best sequence of inflected forms for each MT hypothesis Si according to the LM and the inflection model.
    Page 5, “Integration of inflection models with MT systems”
  5. In addition, using a trigram LM on stems may lead to larger violations of the Markov independence assumptions, than using a trigram LM on fully inflected words.
    Page 5, “Integration of inflection models with MT systems”

See all papers in Proc. ACL 2008 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

morphological analysis

Appears in 5 sentences as: morphological analyser (1) morphological analyses (1) Morphological analysis (1) morphological analysis (4)
In Applying Morphology Generation Models to Machine Translation
  1. Work in this area is motivated by two advantages offered by morphological analysis : (1) it provides linguistically motivated clustering of words and makes the data less sparse; (2) it captures morphological constraints applicable on the target side, such as agreement phenomena.
    Page 1, “Introduction”
  2. Morphological analysis: returns the set of possible morphological analyses Aw = {a1, ..., a”} for w. A morphological analysis a is a vector of categorical values, where each dimension and its possible values are defined by L.
    Page 2, “Inflection prediction models”
  3. For the morphological analysis operation, we used the same set of morphological features described in (Minkov et al., 2007), that is, seven features for Russian (POS, Person, Number, Gender, Tense, Mood and Case) and 12 for Arabic (POS, Person, Number, Gender, Tense, Mood, Negation, Determiner, Conjunction, Preposition, Object and Possessive pronouns).
    Page 2, “Inflection prediction models”
  4. The same is true with the operation of morphological analysis .
    Page 2, “Inflection prediction models”
  5. The Russian lexicon was obtained by intersecting a general domain lexicon with our training data (Table 2), and the Arabic lexicon was obtained by running the Buckwalter morphological analyser (Buck-walter, 2004) on the training data.
    Page 3, “Inflection prediction models”

See all papers in Proc. ACL 2008 that mention morphological analysis.

See all papers in Proc. ACL that mention morphological analysis.

Back to top.

sentence pair

Appears in 5 sentences as: sent pairs (1) sentence pair (3) sentence pairs (1)
In Applying Morphology Generation Models to Machine Translation
  1. Figure 1 shows an example of an aligned English-Russian sentence pair : on the source (English) side, POS tags and word dependency structure are indicated by solid arcs.
    Page 3, “Inflection prediction models”
  2. Figure 1: Aligned English—Russian sentence pair with syntactic and morphological annotation.
    Page 3, “Inflection prediction models”
  3. These aligned sentence pairs form the training data of the inflection models as well.
    Page 4, “Machine translation systems and data”
  4. Dataset sent pairs word tokens (avg/sent) English-Russian
    Page 4, “Machine translation systems and data”
  5. The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0).
    Page 8, “MT performance results”

See all papers in Proc. ACL 2008 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

translation model

Appears in 5 sentences as: translation model (2) translation modeling (1) translation models (2)
In Applying Morphology Generation Models to Machine Translation
  1. Though our motivation is similar to that of Koehn and Hoang (2007), we chose to build an independent component for inflection prediction in isolation rather than folding morphological information into the main translation model .
    Page 2, “Related work”
  2. The treelet translation model is estimated using a parallel corpus.
    Page 4, “Machine translation systems and data”
  3. In this method, stemming can impact word alignment in addition to the translation models .
    Page 6, “Integration of inflection models with MT systems”
  4. From the results of Method 2 we can see that reducing sparsity at translation modeling is advantageous.
    Page 7, “MT performance results”
  5. From these results, we also see that about half of the gain from using stemming in the base MT system came from improving word alignment, and half came from using translation models operating at the less sparse stem level.
    Page 7, “MT performance results”

See all papers in Proc. ACL 2008 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

baseline system

Appears in 4 sentences as: baseline system (4)
In Applying Morphology Generation Models to Machine Translation
  1. Propagating the uncertainty of the baseline system by using more input hypotheses consistently improves performance across the different methods, with an additional improvement of between .2 and .4 BLEU points.
    Page 7, “MT performance results”
  2. In all scenarios, two human judges (native speakers of these languages) evaluated 100 sentences that had different translations by the baseline system and our model.
    Page 8, “MT performance results”
  3. The judges were given the reference translations but not the source sentences, and were asked to classify each sentence pair into three categories: (1) the baseline system is better (score=-1), (2) the output of our model is better (score=l), or (3) they are of the same quality (score=0).
    Page 8, “MT performance results”
  4. We see that in all cases, the human evaluation scores are positive, indicating that our models produce translations that are better than those produced by the baseline system .
    Page 8, “MT performance results”

See all papers in Proc. ACL 2008 that mention baseline system.

See all papers in Proc. ACL that mention baseline system.

Back to top.

language pair

Appears in 4 sentences as: language pair (3) language pairs (1)
In Applying Morphology Generation Models to Machine Translation
  1. For each language pair , we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
    Page 4, “Machine translation systems and data”
  2. All MT systems for a given language pair used the same datasets.
    Page 4, “Machine translation systems and data”
  3. However, for some language pairs , stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target.
    Page 5, “Integration of inflection models with MT systems”
  4. In the future, we would like to include more sophistication in the design of a lexicon for a particular language pair based on error analysis, and extend our preprocessing to include other operations such as word segmentation.
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2008 that mention language pair.

See all papers in Proc. ACL that mention language pair.

Back to top.

machine translation

Appears in 4 sentences as: machine translation (4)
In Applying Morphology Generation Models to Machine Translation
  1. We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages.
    Page 1, “Abstract”
  2. One of the outstanding problems for further improving machine translation (MT) systems is the difficulty of dividing the MT problem into sub-problems and tackling each subproblem in isolation to improve the overall quality of MT.
    Page 1, “Introduction”
  3. This paper describes a successful attempt to integrate a subcomponent for generating word inflections into a statistical machine translation (SMT)
    Page 1, “Introduction”
  4. We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems.
    Page 4, “Machine translation systems and data”

See all papers in Proc. ACL 2008 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

phrase-based

Appears in 3 sentences as: phrase-based (4)
In Applying Morphology Generation Models to Machine Translation
  1. In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system.
    Page 2, “Related work”
  2. We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems.
    Page 4, “Machine translation systems and data”
  3. For the phrase-based system, we generated the annotations needed by first parsing the source sentence e, aligning the source and candidate translations with the word-alignment model used in training, and projected the dependency tree to the target using the algorithm of (Quirk et al., 2005).
    Page 6, “Integration of inflection models with MT systems”

See all papers in Proc. ACL 2008 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.