Training Phrase Translation Models with Leaving-One-Out
Wuebker, Joern and Mauser, Arne and Ney, Hermann

Article Structure

Abstract

Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.

Introduction

A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003).

Related Work

It has been pointed out in literature, that training phrase models poses some difficulties.

Alignment

The training process is divided into three parts.

Phrase Model Training

The produced phrase alignment can be given as a single best alignment, as the n-best alignments or as an alignment graph representing all alignments considered by the decoder.

Experimental Evaluation

5.1 Experimental Setup

Conclusion

We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model.

Topics

phrase table

Appears in 33 sentences as: Phrase Table (1) Phrase table (1) phrase table (28) phrase tables (5)
In Training Phrase Translation Models with Leaving-One-Out
  1. As a side effect, the phrase table size is reduced by more than 80%.
    Page 1, “Abstract”
  2. The most common method for obtaining the phrase table is heuristic extraction from automatically word-aligned bilingual training data (Och et al., 1999).
    Page 1, “Introduction”
  3. Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al., 1999).
    Page 2, “Related Work”
  4. In addition, (DeNero et al., 2006) found that the trained phrase table shows a highly peaked distribution in opposition to the more flat distribution resulting from heuristic extraction, leaving the decoder only few translation options at decoding time.
    Page 2, “Related Work”
  5. They observe that due to several constraints and pruning steps, the trained phrase table is much smaller than the heuristically extracted one, while preserving translation quality.
    Page 3, “Related Work”
  6. To be able to perform the re-computation in an efficient way, we store the source and target phrase marginal counts for each phrase in the phrase table .
    Page 5, “Alignment”
  7. It is then straightforward to compute the phrase counts after leaving-one-out using the phrase probabilities and marginal counts stored in the phrase table .
    Page 5, “Alignment”
  8. From the initial phrase table , each of these blocks only loads the phrases that are required for alignment.
    Page 5, “Alignment”
  9. 4.3 Phrase Table Interpolation
    Page 6, “Phrase Model Training”
  10. As (DeNero et al., 2006) have reported improvements in translation quality by interpolation of phrase tables produced by the generative and the heuristic model, we adopt this method and also report results using lo g-linear interpolation of the estimated model with the original model.
    Page 6, “Phrase Model Training”
  11. When interpolating phrase tables containing different sets of phrase pairs, we retain the intersection of the two.
    Page 6, “Phrase Model Training”

See all papers in Proc. ACL 2010 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

BLEU

Appears in 19 sentences as: BLEU (23)
In Training Phrase Translation Models with Leaving-One-Out
  1. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU .
    Page 1, “Abstract”
  2. Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline.
    Page 2, “Introduction”
  3. We find that by interpolation with the heuristically extracted phrases translation performance can reach up to 1.4 BLEU improvement over the baseline on the test set.
    Page 2, “Introduction”
  4. We perform minimum error rate training with the downhill simplex algorithm (Nelder and Mead, 1965) on the development data to obtain a set of scaling factors that achieve a good BLEU score.
    Page 3, “Alignment”
  5. The scaling factors of the translation models have been optimized for BLEU on the DEV data.
    Page 7, “Experimental Evaluation”
  6. BLEU ‘ TER ‘
    Page 7, “Experimental Evaluation”
  7. The metrics used for evaluation are the case-sensitive BLEU (Papineni et al., 2002) score and the translation edit rate (TER) (Snover et al., 2006) with one reference translation.
    Page 7, “Experimental Evaluation”
  8. Our final results show an improvement of 1.4 BLEU over the heuristically extracted phrase model on the test data set.
    Page 7, “Experimental Evaluation”
  9. BLEU
    Page 8, “Experimental Evaluation”
  10. Figure 3: Performance on DEV in BLEU of the count model plotted against size n of n-best list on a logarithmic scale.
    Page 8, “Experimental Evaluation”
  11. The variations in performance with sizes between n = 10 and n = 10000 are less than 0.2 BLEU .
    Page 8, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

translation probabilities

Appears in 15 sentences as: translation probabilities (14) translation probability (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
    Page 1, “Abstract”
  2. The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities , is one of the main components of an SMT system.
    Page 1, “Introduction”
  3. In this work, we propose to directly train our phrase models by applying a forced alignment procedure where we use the decoder to find a phrase alignment between source and target sentences of the training data and then updating phrase translation probabilities based on this alignment.
    Page 1, “Introduction”
  4. That in turn leads to over-fitting which shows in overly deter-minized estimates of the phrase translation probabilities .
    Page 2, “Related Work”
  5. They show that by applying a prior distribution over the phrase translation probabilities they can prevent over-fitting.
    Page 3, “Related Work”
  6. Therefore those long phrases are trained to fit only a few sentence pairs, strongly overestimating their translation probabilities and failing to generalize.
    Page 4, “Alignment”
  7. When using leaving-one-out, we modify the phrase translation probabilities for each sentence pair.
    Page 4, “Alignment”
  8. We have developed two different models for phrase translation probabilities which make use of the force-aligned training data.
    Page 6, “Phrase Model Training”
  9. The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment.
    Page 6, “Phrase Model Training”
  10. The translation probability of a phrase pair (1216) is estimated as
    Page 6, “Phrase Model Training”
  11. While this might not cover all phrase translation probabilities , it allows the search space and translation times to be feasible and still contains the most probable alignments.
    Page 6, “Phrase Model Training”

See all papers in Proc. ACL 2010 that mention translation probabilities.

See all papers in Proc. ACL that mention translation probabilities.

Back to top.

sentence pair

Appears in 13 sentences as: sentence pair (10) sentence pairs (4)
In Training Phrase Translation Models with Leaving-One-Out
  1. In this method, all phrases of the sentence pair that match constraints given by the alignment are extracted.
    Page 1, “Introduction”
  2. When given a bilingual sentence pair , we can usually assume there are a number of equally correct phrase segmentations and corresponding alignments.
    Page 2, “Related Work”
  3. The idea of forced alignment is to perform a phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding.
    Page 4, “Alignment”
  4. Consequently, we can modify Equation 2 to define the best segmentation of a sentence pair as:
    Page 4, “Alignment”
  5. The training data that consists of N parallel sentence pairs fn and en for n = l, .
    Page 4, “Alignment”
  6. All phrases extracted from a specific sentence pair fn, en can be used for the alignment of this sentence pair .
    Page 4, “Alignment”
  7. Therefore those long phrases are trained to fit only a few sentence pairs , strongly overestimating their translation probabilities and failing to generalize.
    Page 4, “Alignment”
  8. However, (DeNero et al., 2006) experienced similar over-fitting with short phrases due to the fact that the same word sequence can be segmented in different ways, leading to specific segmentations being learned for specific training sentence pairs .
    Page 4, “Alignment”
  9. When using leaving-one-out, we modify the phrase translation probabilities for each sentence pair .
    Page 4, “Alignment”
  10. For a training example fn, en, we have to remove all phrases Cn(f, 6) that were extracted from this sentence pair from the phrase counts that
    Page 4, “Alignment”
  11. The same holds for the marginal counts Cn(é) and Cn(f Starting from Equation 1, the leaving-one-out phrase probability for training sentence pair n
    Page 5, “Alignment”

See all papers in Proc. ACL 2010 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

Viterbi

Appears in 11 sentences as: Viterbi (11) viterbi (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
    Page 2, “Introduction”
  2. 1. the single-best Viterbi alignment, 2. the n-best alignments,
    Page 2, “Introduction”
  3. For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations.
    Page 3, “Related Work”
  4. It is shown in (Ferrer and Juan, 2009), that Viterbi training produces almost the same results as full Baum-Welch training.
    Page 3, “Related Work”
  5. Our approach is similar to the one presented in (Ferrer and Juan, 2009) in that we compare Viterbi and a training method based on the Forward-Backward algorithm.
    Page 3, “Related Work”
  6. In training, they use a greedy algorithm to produce the Viterbi phrase alignment and then apply a hill-climbing technique that modifies the Viterbi alignment by merge, move, split, and swap operations to find an alignment with a better probability in each iteration.
    Page 3, “Related Work”
  7. 4.1 Viterbi
    Page 6, “Phrase Model Training”
  8. The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment.
    Page 6, “Phrase Model Training”
  9. This can be applied to either the Viterbi phrase alignment or an n-best list.
    Page 6, “Phrase Model Training”
  10. We can see in Figure 3 that translation performance improves by moving from the Viterbi alignment to n-best alignments.
    Page 8, “Experimental Evaluation”
  11. While models trained from Viterbi alignments already lead to good results, we have demonstrated
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

log-linear

Appears in 10 sentences as: Log-linear (1) log-linear (9)
In Training Phrase Translation Models with Leaving-One-Out
  1. The translation process is implemented as a weighted log-linear combination of several models hm(e{, sf , ff) including the logarithm of the phrase probability in source-to-target as well as in target-to-source direction.
    Page 1, “Introduction”
  2. The log-linear interpolations pint(f|é) of the phrase translation probabilities are estimated as
    Page 6, “Phrase Model Training”
  3. As a generalization of the fixed interpolation of the two phrase tables we also experimented with adding the two trained phrase probabilities as additional features to the log-linear framework.
    Page 6, “Phrase Model Training”
  4. With good log-linear feature weights, feature-wise combination should perform at least as well as fixed interpolation.
    Page 6, “Phrase Model Training”
  5. This illustrates that a higher number of features results in a less reliable optimization of the log-linear parameters.
    Page 7, “Phrase Model Training”
  6. The features are combined in a log-linear way.
    Page 7, “Experimental Evaluation”
  7. In Table 5 we can see that the performance of the heuristic phrase model can be increased by 0.6 BLEU on TEST by filtering the phrase table to contain the same phrases as the count model and reoptimizing the log-linear model weights.
    Page 8, “Experimental Evaluation”
  8. Log-linear interpolation of the count model with the heuristic yields a further increase, showing an improvement of 1.3 BLEU on DEV and 1.4 BLEU on TEST over the baseline.
    Page 8, “Experimental Evaluation”
  9. Further, scores for fixed log-linear interpolation of the count model trained with leaving-one-out with the heuristic as well as a feature-wise combination are shown.
    Page 9, “Experimental Evaluation”
  10. Integrating both models into the log-linear framework (feat.
    Page 9, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

word alignment

Appears in 9 sentences as: Word Alignment (1) word alignment (6) word alignments (2)
In Training Phrase Translation Models with Leaving-One-Out
  1. Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
    Page 2, “Introduction”
  2. Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al., 1999).
    Page 2, “Related Work”
  3. In addition, we do not restrict the training to phrases consistent with the word alignment , as was done in (DeNero et al., 2006).
    Page 3, “Related Work”
  4. This allows us to recover from flawed word alignments .
    Page 3, “Related Work”
  5. The joint model by (Marcu and Wong, 2002) is refined by (Birch et al., 2006) who use high-confidence word alignments to constrain the search space in training.
    Page 3, “Related Work”
  6. A phrase extraction is performed for each training sentence pair separately using the same word alignment as for the initialization.
    Page 5, “Alignment”
  7. The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment .
    Page 6, “Phrase Model Training”
  8. For the heuristic phrase model, we first use GIZA++ (Och and Ney, 2003) to compute the word alignment on TRAIN.
    Page 7, “Experimental Evaluation”
  9. Next we obtain a phrase table by extraction of phrases from the word alignment .
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

model training

Appears in 9 sentences as: model trained (2) model training (5) models trained (2)
In Training Phrase Translation Models with Leaving-One-Out
  1. Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
    Page 2, “Introduction”
  2. Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline.
    Page 2, “Introduction”
  3. Sentences for which the decoder can not find an alignment are discarded for the phrase model training .
    Page 4, “Alignment”
  4. phrase model training .
    Page 4, “Alignment”
  5. , N is used for both the initialization of the translation model p(f|é) and the phrase model training .
    Page 4, “Alignment”
  6. To cope with the runtime and memory requirements of phrase model training that was pointed out by previous work (Marcu and Wong, 2002; Birch et al., 2006), we parallelized the forced alignment by splitting the training corpus into blocks of 10k sentence pairs.
    Page 5, “Alignment”
  7. ), the count model trained with leaving-one-out (110) and cross-validation (cv), the weighted count model and the full model.
    Page 9, “Experimental Evaluation”
  8. Further, scores for fixed log-linear interpolation of the count model trained with leaving-one-out with the heuristic as well as a feature-wise combination are shown.
    Page 9, “Experimental Evaluation”
  9. While models trained from Viterbi alignments already lead to good results, we have demonstrated
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention model training.

See all papers in Proc. ACL that mention model training.

Back to top.

phrase-based

Appears in 9 sentences as: phrase-based (9)
In Training Phrase Translation Models with Leaving-One-Out
  1. Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
    Page 1, “Abstract”
  2. A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003).
    Page 1, “Introduction”
  3. We use a modified version of a phrase-based decoder to perform the forced alignment.
    Page 1, “Introduction”
  4. For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations.
    Page 3, “Related Work”
  5. We also include these word lexica, as they are standard components of the phrase-based system.
    Page 3, “Related Work”
  6. They report improvements over a phrase-based model that uses an inverse phrase model and a language model.
    Page 3, “Related Work”
  7. We apply our normal phrase-based decoder on the source side of the training data and constrain the translations to the corresponding target sentences from the training data.
    Page 4, “Alignment”
  8. The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
    Page 7, “Experimental Evaluation”
  9. We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

phrase pairs

Appears in 8 sentences as: phrase pair (3) phrase pairs (5)
In Training Phrase Translation Models with Leaving-One-Out
  1. The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system.
    Page 1, “Introduction”
  2. We refer to singleton phrases as phrase pairs that occur only in one sentence.
    Page 5, “Alignment”
  3. For these sentences, the decoder needs the singleton phrase pairs to produce an alignment.
    Page 5, “Alignment”
  4. Standard leaving-one-out assigns a fixed probability 04 to singleton phrase pairs .
    Page 5, “Alignment”
  5. The translation probability of a phrase pair (1216) is estimated as
    Page 6, “Phrase Model Training”
  6. where CFA(f, 6) is the count of the phrase pair (f, 6) in the phrase-aligned training data.
    Page 6, “Phrase Model Training”
  7. We will refer to this model as the count model as we simply count the number of occurrences of a phrase pair .
    Page 6, “Phrase Model Training”
  8. When interpolating phrase tables containing different sets of phrase pairs , we retain the intersection of the two.
    Page 6, “Phrase Model Training”

See all papers in Proc. ACL 2010 that mention phrase pairs.

See all papers in Proc. ACL that mention phrase pairs.

Back to top.

language model

Appears in 6 sentences as: language model (6) language modeling (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. The phrase model is combined with a language model , word lexicon models, word and phrase penalty, and many oth: ers.
    Page 1, “Introduction”
  2. They report improvements over a phrase-based model that uses an inverse phrase model and a language model .
    Page 3, “Related Work”
  3. A language model is not used in this case, as the system is constrained to the given target sentence and thus the language model score has no effect on the alignment.
    Page 4, “Alignment”
  4. To deal with this problem, instead of simple phrase length restriction, we propose to apply the leaving-one-out method, which is also used for language modeling techniques (Kneser and Ney, 1995).
    Page 4, “Alignment”
  5. The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
    Page 7, “Experimental Evaluation”
  6. We used a 4-gram language model with modified Kneser-Ney discounting for all experiments.
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

TER

Appears in 6 sentences as: TER (7)
In Training Phrase Translation Models with Leaving-One-Out
  1. ‘ BLEU ‘ TER
    Page 7, “Experimental Evaluation”
  2. The metrics used for evaluation are the case-sensitive BLEU (Papineni et al., 2002) score and the translation edit rate ( TER ) (Snover et al., 2006) with one reference translation.
    Page 7, “Experimental Evaluation”
  3. A second iteration of the training algorithm shows nearly no changes in BLEU score, but a small improvement in TER .
    Page 8, “Experimental Evaluation”
  4. BLEU TER BLEU TER baseline 25 .7 61.1 26.3 60.9 baseline filt.
    Page 9, “Experimental Evaluation”
  5. The full model, where we extract all phrases from the search graph, weighted with their posterior probability, performs comparable to the count model with a slightly worse BLEU and a slightly better TER .
    Page 9, “Experimental Evaluation”
  6. In TER , improvements are 0.4 and 1.7 points.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

translation models

Appears in 6 sentences as: translation model (2) translation models (5)
In Training Phrase Translation Models with Leaving-One-Out
  1. Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
    Page 2, “Introduction”
  2. This is different from word-based translation models , where a typical assumption is that each target word corresponds to only one source word.
    Page 2, “Related Work”
  3. , N is used for both the initialization of the translation model p(f|é) and the phrase model training.
    Page 4, “Alignment”
  4. The scaling factors of the translation models have been optimized for BLEU on the DEV data.
    Page 7, “Experimental Evaluation”
  5. We will focus on the proposed leaving-one-out technique and show that it helps in finding good phrasal alignments on the training data that lead to improved translation models .
    Page 7, “Experimental Evaluation”
  6. We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention translation models.

See all papers in Proc. ACL that mention translation models.

Back to top.

translation system

Appears in 5 sentences as: translation system (3) translation systems (1) translations system (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. In (Liang et al., 2006) a discriminative translation system is described.
    Page 3, “Related Work”
  2. full and competitive translation system as starting point with reordering and all models included.
    Page 3, “Related Work”
  3. First we obtain all models needed for a normal translations system .
    Page 3, “Alignment”
  4. The idea of forced alignment is to perform a phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding.
    Page 4, “Alignment”
  5. In addition to the improved performance, the trained models are smaller leading to faster and smaller translation systems .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

translation quality

Appears in 5 sentences as: translation quality (5)
In Training Phrase Translation Models with Leaving-One-Out
  1. Our results show, that this leads to a better translation quality .
    Page 2, “Introduction”
  2. Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline.
    Page 2, “Introduction”
  3. The model shows improvements in translation quality over the single-word-based IBM Model 4 (Brown et al., 1993) on a subset of the Canadian Hansards corpus.
    Page 3, “Related Work”
  4. They observe that due to several constraints and pruning steps, the trained phrase table is much smaller than the heuristically extracted one, while preserving translation quality .
    Page 3, “Related Work”
  5. As (DeNero et al., 2006) have reported improvements in translation quality by interpolation of phrase tables produced by the generative and the heuristic model, we adopt this method and also report results using lo g-linear interpolation of the estimated model with the original model.
    Page 6, “Phrase Model Training”

See all papers in Proc. ACL 2010 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

segmentations

Appears in 4 sentences as: segmentations (4)
In Training Phrase Translation Models with Leaving-One-Out
  1. Ideally, we would produce all possible segmentations and alignments during training.
    Page 2, “Introduction”
  2. When given a bilingual sentence pair, we can usually assume there are a number of equally correct phrase segmentations and corresponding alignments.
    Page 2, “Related Work”
  3. As a result of this ambiguity, different segmentations are recruited for different examples during training.
    Page 2, “Related Work”
  4. However, (DeNero et al., 2006) experienced similar over-fitting with short phrases due to the fact that the same word sequence can be segmented in different ways, leading to specific segmentations being learned for specific training sentence pairs.
    Page 4, “Alignment”

See all papers in Proc. ACL 2010 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.

Statistical Machine Translation

Appears in 3 sentences as: Statistical Machine Translation (2) statistical machine translation (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
    Page 1, “Abstract”
  2. Europarl task from the ACL 2008 Workshop on Statistical Machine Translation (WMT08).
    Page 2, “Introduction”
  3. We conducted our experiments on the German-English data published for the ACL 2008 Workshop on Statistical Machine Translation (WMT08).
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention Statistical Machine Translation.

See all papers in Proc. ACL that mention Statistical Machine Translation.

Back to top.

SMT system

Appears in 3 sentences as: SMT system (3)
In Training Phrase Translation Models with Leaving-One-Out
  1. A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003).
    Page 1, “Introduction”
  2. The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system .
    Page 1, “Introduction”
  3. The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

Machine Translation

Appears in 3 sentences as: Machine Translation (2) machine translation (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
    Page 1, “Abstract”
  2. Europarl task from the ACL 2008 Workshop on Statistical Machine Translation (WMT08).
    Page 2, “Introduction”
  3. We conducted our experiments on the German-English data published for the ACL 2008 Workshop on Statistical Machine Translation (WMT08).
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention Machine Translation.

See all papers in Proc. ACL that mention Machine Translation.

Back to top.

generative model

Appears in 3 sentences as: generative model (2) generative models (1)
In Training Phrase Translation Models with Leaving-One-Out
  1. For a generative model , (DeNero et al., 2006) gave a detailed analysis of the challenges and arising problems.
    Page 2, “Related Work”
  2. Additionally we consider smoothing by different kinds of interpolation of the generative model with the state-of-the-art heuristics.
    Page 6, “Phrase Model Training”
  3. To investigate the generative models , we replace the two phrase translation probabilities and keep the other features identical to the baseline.
    Page 7, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

BLEU score

Appears in 3 sentences as: BLEU score (3)
In Training Phrase Translation Models with Leaving-One-Out
  1. We perform minimum error rate training with the downhill simplex algorithm (Nelder and Mead, 1965) on the development data to obtain a set of scaling factors that achieve a good BLEU score .
    Page 3, “Alignment”
  2. A second iteration of the training algorithm shows nearly no changes in BLEU score , but a small improvement in TER.
    Page 8, “Experimental Evaluation”
  3. yields a BLEU score slightly lower than with fixed interpolation on both DEV and TEST.
    Page 9, “Experimental Evaluation”

See all papers in Proc. ACL 2010 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.