Better Alignments = Better Translations?
Ganchev, Kuzman and Graça, João V. and Taskar, Ben

Article Structure

Abstract

Automatic word alignment is a key step in training statistical machine translation systems.

Introduction

The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair.

Statistical word alignment

Statistical word alignment (Brown et al., 1994) is the task identifying which words are translations of each other in a bilingual sentence corpus.

Adding agreement constraints

Graca et al.

Word alignment results

We evaluated the agreement HMM model on two corpora for which hand-aligned data are widely available: the Hansards corpus (Och and Ney, 2000) of English/French parliamentary proceedings and the Europarl corpus (Koehn, 2002) with EPPS annotation (Lambert et al., 2005) of English/Spanish.

Phrase-based machine translation

In this section we attempt to investigate whether our improved alignments produce improved machine

Conclusions

In this work we have evaluated agreement-constrained EM training for statistical word alignment models.

Topics

word alignment

Appears in 22 sentences as: word alignment (18) word alignments (4) word aligns (1)
In Better Alignments = Better Translations?
  1. Automatic word alignment is a key step in training statistical machine translation systems.
    Page 1, “Abstract”
  2. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality.
    Page 1, “Abstract”
  3. The word alignment problem has received much recent attention, but improvements in standard measures of word alignment performance often do not result in better translations.
    Page 1, “Introduction”
  4. In this work, we show that by changing the way the word alignment models are trained and
    Page 1, “Introduction”
  5. We present extensive experimental results evaluating a new training scheme for unsupervised word alignment models: an extension of the Expectation Maximization algorithm that allows effective injection of additional information about the desired alignments into the unsupervised training process.
    Page 1, “Introduction”
  6. Our contribution is a large scale evaluation of this methodology for word alignments , an investigation of how the produced alignments differ and how they can be used to consistently improve machine translation performance (as measured by BLEU score) across many languages on training corpora with up to hundred thousand sentences.
    Page 1, “Introduction”
  7. After presenting the models and the algorithm in Sections 2 and 3, in Section 4 we examine how the new alignments differ from standard models, and find that the new method consistently improves word alignment performance, measured either as alignment error rate or weighted F—score.
    Page 1, “Introduction”
  8. Statistical word alignment (Brown et al., 1994) is the task identifying which words are translations of each other in a bilingual sentence corpus.
    Page 2, “Statistical word alignment”
  9. Figure 2 shows two examples of word alignment of a sentence pair.
    Page 2, “Statistical word alignment”
  10. Due to the ambiguity of the word alignment task, it is common to distinguish two kinds of alignments (Och and Ney, 2003).
    Page 2, “Statistical word alignment”
  11. 2.1 Baseline word alignment models
    Page 2, “Statistical word alignment”

See all papers in Proc. ACL 2008 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

Viterbi

Appears in 21 sentences as: Viterbi (22)
In Better Alignments = Better Translations?
  1. Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding.
    Page 1, “Introduction”
  2. The standard way to do this is to find the most probable alignment, using the Viterbi algorithm.
    Page 4, “Adding agreement constraints”
  3. There are two main differences between posterior decoding and Viterbi decoding.
    Page 4, “Adding agreement constraints”
  4. Viterbi , by contrast, must commit to one high-scoring alignment.
    Page 4, “Adding agreement constraints”
  5. use Viterbi decoding, with larger improvements for small amounts of training data.
    Page 5, “Word alignment results”
  6. As described above an alternative to Viterbi decoding is to accept all alignments that have probability above some threshold.
    Page 5, “Word alignment results”
  7. Figure 7 shows an example of the same sentence, using the same model where in one case Viterbi decoding was used and in the other case Posterior decoding tuned to minimize AER on a development set
    Page 5, “Word alignment results”
  8. Left: Viterbi
    Page 5, “Phrase-based machine translation”
  9. 26 '3 Baseline Viterbi ----- --E| ----- ~ I
    Page 6, “Phrase-based machine translation”
  10. Our initial experiments with Viterbi decoding and posterior decoding showed that for our agreement model posterior decoding could provide better alignment quality.
    Page 6, “Phrase-based machine translation”
  11. tune the threshold: we choose a threshold that gives the same number of aligned points as Viterbi decoding produces.
    Page 7, “Phrase-based machine translation”

See all papers in Proc. ACL 2008 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

alignment models

Appears in 17 sentences as: alignment model (6) alignment models (11)
In Better Alignments = Better Translations?
  1. In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models .
    Page 1, “Abstract”
  2. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  3. In this work, we show that by changing the way the word alignment models are trained and
    Page 1, “Introduction”
  4. We present extensive experimental results evaluating a new training scheme for unsupervised word alignment models : an extension of the Expectation Maximization algorithm that allows effective injection of additional information about the desired alignments into the unsupervised training process.
    Page 1, “Introduction”
  5. 2.1 Baseline word alignment models
    Page 2, “Statistical word alignment”
  6. Figure 1 illustrates the mapping between the usual HMM notation and the HMM alignment model .
    Page 2, “Statistical word alignment”
  7. All word alignment models we consider are normally trained using the Expectation Maximization
    Page 2, “Statistical word alignment”
  8. For the HMM alignment model , these posteriors can be efficiently calculated by the Forward-Backward algorithm.
    Page 2, “Statistical word alignment”
  9. They suggest how this framework can be used to encourage two word alignment models to agree during training.
    Page 3, “Adding agreement constraints”
  10. Most MT systems train an alignment model in each direction and then heuristically combine their predictions.
    Page 3, “Adding agreement constraints”
  11. We then train the competing alignment models and compute competing alignments using different decoding schemes.
    Page 6, “Phrase-based machine translation”

See all papers in Proc. ACL 2008 that mention alignment models.

See all papers in Proc. ACL that mention alignment models.

Back to top.

BLEU

Appears in 12 sentences as: BLEU (13)
In Better Alignments = Better Translations?
  1. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  2. Our contribution is a large scale evaluation of this methodology for word alignments, an investigation of how the produced alignments differ and how they can be used to consistently improve machine translation performance (as measured by BLEU score) across many languages on training corpora with up to hundred thousand sentences.
    Page 1, “Introduction”
  3. In 10 out of 12 cases we improve BLEU score by at least i point and by more than 1 point in 4 out of 12 cases.
    Page 1, “Introduction”
  4. Unfortunately, as was shown by Fraser and Marcu (2007) AER can have weak correlation with translation performance as measured by BLEU score (Pa-pineni et al., 2002), when the alignments are used to train a phrase-based translation system.
    Page 4, “Word alignment results”
  5. We report BLEU scores using a script available with the baseline system.
    Page 6, “Phrase-based machine translation”
  6. Figure 8: BLEU score as the amount of training data is increased on the Hansards corpus for the best decoding method for each alignment model.
    Page 6, “Phrase-based machine translation”
  7. In principle, we would like to tune the threshold by optimizing BLEU score on a development set, but that is impractical for experiments with many pairs of languages.
    Page 7, “Phrase-based machine translation”
  8. Figure 8 shows the BLEU score for the baseline HMM, our agreement model and GIZA Model 4 as we vary the amount of training data from 104 - 106 sentences.
    Page 7, “Phrase-based machine translation”
  9. Table 2: BLEU scores for all language pairs using up to 100k sentences.
    Page 7, “Phrase-based machine translation”
  10. The marks (++)and (+)denote that agreement with posterior decoding is better by l BLEU point and 0.25 BLEU points respectively than the best baseline HMM model; analogously for (__), (_); while (adenotes smaller differences.
    Page 7, “Phrase-based machine translation”
  11. Table 3: BLEU scores for all language pairs using all available data.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

machine translation

Appears in 11 sentences as: machine translation (11)
In Better Alignments = Better Translations?
  1. Automatic word alignment is a key step in training statistical machine translation systems.
    Page 1, “Abstract”
  2. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality.
    Page 1, “Abstract”
  3. The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair.
    Page 1, “Introduction”
  4. Our contribution is a large scale evaluation of this methodology for word alignments, an investigation of how the produced alignments differ and how they can be used to consistently improve machine translation performance (as measured by BLEU score) across many languages on training corpora with up to hundred thousand sentences.
    Page 1, “Introduction”
  5. Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding.
    Page 1, “Introduction”
  6. A natural question is how to tune the threshold in order to improve machine translation quality.
    Page 5, “Word alignment results”
  7. In the next section we evaluate and compare the effects of the different alignments in a phrase based machine translation system.
    Page 5, “Word alignment results”
  8. In particular we fix a state of the art machine translation system1 and measure its performance when we vary the supplied word alignments.
    Page 6, “Phrase-based machine translation”
  9. The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year.
    Page 6, “Phrase-based machine translation”
  10. In addition to the Hansards corpus and the Europarl English-Spanish corpus, we used four other corpora for the machine translation experiments.
    Page 6, “Phrase-based machine translation”
  11. To our knowledge this is the first extensive evaluation where improvements in alignment accuracy lead to improvements in machine translation performance.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

BLEU score

Appears in 10 sentences as: BLEU score (7) BLEU scores (3)
In Better Alignments = Better Translations?
  1. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score ) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  2. Our contribution is a large scale evaluation of this methodology for word alignments, an investigation of how the produced alignments differ and how they can be used to consistently improve machine translation performance (as measured by BLEU score ) across many languages on training corpora with up to hundred thousand sentences.
    Page 1, “Introduction”
  3. In 10 out of 12 cases we improve BLEU score by at least i point and by more than 1 point in 4 out of 12 cases.
    Page 1, “Introduction”
  4. Unfortunately, as was shown by Fraser and Marcu (2007) AER can have weak correlation with translation performance as measured by BLEU score (Pa-pineni et al., 2002), when the alignments are used to train a phrase-based translation system.
    Page 4, “Word alignment results”
  5. We report BLEU scores using a script available with the baseline system.
    Page 6, “Phrase-based machine translation”
  6. Figure 8: BLEU score as the amount of training data is increased on the Hansards corpus for the best decoding method for each alignment model.
    Page 6, “Phrase-based machine translation”
  7. In principle, we would like to tune the threshold by optimizing BLEU score on a development set, but that is impractical for experiments with many pairs of languages.
    Page 7, “Phrase-based machine translation”
  8. Figure 8 shows the BLEU score for the baseline HMM, our agreement model and GIZA Model 4 as we vary the amount of training data from 104 - 106 sentences.
    Page 7, “Phrase-based machine translation”
  9. Table 2: BLEU scores for all language pairs using up to 100k sentences.
    Page 7, “Phrase-based machine translation”
  10. Table 3: BLEU scores for all language pairs using all available data.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

precision and recall

Appears in 8 sentences as: Precision and recall (2) precision and recall (6)
In Better Alignments = Better Translations?
  1. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall tradeoffs, and how rare and common words are affected across several language pairs.
    Page 1, “Abstract”
  2. Consequently, in addition to AER, we focus on precision and recall .
    Page 4, “Word alignment results”
  3. Figure 3 shows the change in precision and recall with the amount of provided training data for the Hansards corpus.
    Page 4, “Word alignment results”
  4. We see that agreement constraints improve both precision and recall when we
    Page 4, “Word alignment results”
  5. Figure 4 shows precision and recall learning curves for rare and common words.
    Page 5, “Word alignment results”
  6. By changing the threshold, we can trade off precision and recall .
    Page 5, “Word alignment results”
  7. Figure 5: Precision and recall tradeoff for posterior decoding with varying threshold.
    Page 5, “Word alignment results”
  8. Figure 6: Precision and recall tradeoff for posterior on Hansards En—>Fr.
    Page 5, “Word alignment results”

See all papers in Proc. ACL 2008 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

language pairs

Appears in 7 sentences as: language pairs (6) languages pairs (1)
In Better Alignments = Better Translations?
  1. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall tradeoffs, and how rare and common words are affected across several language pairs .
    Page 1, “Abstract”
  2. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  3. language pairs used in recent MT competitions.
    Page 2, “Introduction”
  4. Our next set of experiments look at our performance in both directions across our 6 corpora, when we have small to moderate amounts of training data: for the language pairs with more than 100,000 sentences, we use only the first 100,000 sentences.
    Page 7, “Phrase-based machine translation”
  5. Table 2: BLEU scores for all language pairs using up to 100k sentences.
    Page 7, “Phrase-based machine translation”
  6. Table 3: BLEU scores for all language pairs using all available data.
    Page 8, “Conclusions”
  7. We tested this hypothesis on six different language pairs from three different domains, and found that the new alignment scheme not only performs better than the baseline, but also improves over a more complicated, intractable model.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention language pairs.

See all papers in Proc. ACL that mention language pairs.

Back to top.

sentence pair

Appears in 5 sentences as: sentence pair (5)
In Better Alignments = Better Translations?
  1. The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair .
    Page 1, “Introduction”
  2. Figure 2 shows two examples of word alignment of a sentence pair .
    Page 2, “Statistical word alignment”
  3. For each sentence pair instance x = (5,13), we find the posterior
    Page 2, “Adding agreement constraints”
  4. Figure 2 shows two machine-generated alignments of a sentence pair .
    Page 4, “Word alignment results”
  5. The figure illustrates a problem known as garbage collection (Brown et al., 1993), where rare source words tend to align to many target words, since the probability mass of the rare word translations can be hijacked to fit the sentence pair .
    Page 4, “Word alignment results”

See all papers in Proc. ACL 2008 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

development set

Appears in 4 sentences as: development set (3) development sets (1)
In Better Alignments = Better Translations?
  1. Figure 7 shows an example of the same sentence, using the same model where in one case Viterbi decoding was used and in the other case Posterior decoding tuned to minimize AER on a development set
    Page 5, “Word alignment results”
  2. For each alignment model and decoding type we train Moses and use MERT optimization to tune its parameters on a development set .
    Page 6, “Phrase-based machine translation”
  3. For Hansards we randomly chose 1000 and 500 sentences from test 1 and test 2 to be testing and development sets respectively.
    Page 6, “Phrase-based machine translation”
  4. In principle, we would like to tune the threshold by optimizing BLEU score on a development set , but that is impractical for experiments with many pairs of languages.
    Page 7, “Phrase-based machine translation”

See all papers in Proc. ACL 2008 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

MT system

Appears in 4 sentences as: MT system (2) MT systems (2)
In Better Alignments = Better Translations?
  1. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems , and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  2. used, we can get not only improvements in alignment performance, but also in the performance of the MT system that uses those alignments.
    Page 1, “Introduction”
  3. Most MT systems train an alignment model in each direction and then heuristically combine their predictions.
    Page 3, “Adding agreement constraints”
  4. The nature of the complicated relationship between word alignments, the corresponding extracted phrases and the effects on the final MT system still begs for better explanations and metrics.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention MT system.

See all papers in Proc. ACL that mention MT system.

Back to top.

translation system

Appears in 4 sentences as: translation system (2) translation systems (2)
In Better Alignments = Better Translations?
  1. Automatic word alignment is a key step in training statistical machine translation systems .
    Page 1, “Abstract”
  2. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  3. Unfortunately, as was shown by Fraser and Marcu (2007) AER can have weak correlation with translation performance as measured by BLEU score (Pa-pineni et al., 2002), when the alignments are used to train a phrase-based translation system .
    Page 4, “Word alignment results”
  4. In the next section we evaluate and compare the effects of the different alignments in a phrase based machine translation system .
    Page 5, “Word alignment results”

See all papers in Proc. ACL 2008 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

baseline system

Appears in 3 sentences as: baseline system (2) baseline systems (1)
In Better Alignments = Better Translations?
  1. We propose a heuristic for tuning posterior decoding in the absence of annotated alignment data and show improvements over baseline systems for six different
    Page 1, “Introduction”
  2. The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year.
    Page 6, “Phrase-based machine translation”
  3. We report BLEU scores using a script available with the baseline system .
    Page 6, “Phrase-based machine translation”

See all papers in Proc. ACL 2008 that mention baseline system.

See all papers in Proc. ACL that mention baseline system.

Back to top.

error rate

Appears in 3 sentences as: error rate (3)
In Better Alignments = Better Translations?
  1. Fraser and Marcu (2007) note that none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance.
    Page 1, “Introduction”
  2. After presenting the models and the algorithm in Sections 2 and 3, in Section 4 we examine how the new alignments differ from standard models, and find that the new method consistently improves word alignment performance, measured either as alignment error rate or weighted F—score.
    Page 1, “Introduction”
  3. (2008) show that alignment error rate (Och and Ney, 2003) can be improved with agreement constraints.
    Page 4, “Word alignment results”

See all papers in Proc. ACL 2008 that mention error rate.

See all papers in Proc. ACL that mention error rate.

Back to top.

phrase-based

Appears in 3 sentences as: phrase-based (3)
In Better Alignments = Better Translations?
  1. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
    Page 1, “Abstract”
  2. Unfortunately, as was shown by Fraser and Marcu (2007) AER can have weak correlation with translation performance as measured by BLEU score (Pa-pineni et al., 2002), when the alignments are used to train a phrase-based translation system.
    Page 4, “Word alignment results”
  3. The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year.
    Page 6, “Phrase-based machine translation”

See all papers in Proc. ACL 2008 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

state of the art

Appears in 3 sentences as: state of the art (3)
In Better Alignments = Better Translations?
  1. Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding.
    Page 1, “Introduction”
  2. These values are competitive with other state of the art systems (Liang et al., 2006).
    Page 4, “Word alignment results”
  3. In particular we fix a state of the art machine translation system1 and measure its performance when we vary the supplied word alignments.
    Page 6, “Phrase-based machine translation”

See all papers in Proc. ACL 2008 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.