Active Learning for Multilingual Statistical Machine Translation
Haffari, Gholamreza and Sarkar, Anoop

Article Structure

Abstract

Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously.

Introduction

The main source of training data for statistical machine translation (SMT) models is a parallel corpus.

AL-SMT: Multilingual Setting

Consider a multilingual parallel corpus, such as EuroParl, which contains parallel sentences for several languages.

Sentence Selection: Multiple Language Pairs

The goal is to optimize the objective function (1) with minimum human effort in providing the translations.

Sentence Selection: Single Language Pair

Phrases are basic units of translation in phrase-based SMT models.

Experiments

Corpora.

Related Work

(Haffari et al., 2009) provides results for active learning for MT using a single language pair.

Conclusion

This paper introduced the novel active learning task of adding a new language to an existing multilingual set of parallel text.

Topics

language pair

Appears in 26 sentences as: language pair (24) language pairs (4)
In Active Learning for Multilingual Statistical Machine Translation
  1. We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting.
    Page 1, “Abstract”
  2. The multilingual setting provides new opportunities for AL over and above a single language pair .
    Page 1, “Introduction”
  3. In our case, the multiple tasks are individual machine translation tasks for several language pairs .
    Page 1, “Introduction”
  4. languages to the new language depending on the characteristics of each source-target language pair , hence these tasks are competing for annotating the same resource.
    Page 1, “Introduction”
  5. However it may be that in a single language pair , AL would pick a particular sentence for annotation, but in a multilingual setting, a different source language might be able to provide a good translation, thus saving annotation effort.
    Page 1, “Introduction”
  6. 0 We introduce new highly effective sentence selection methods that improve phrase-based SMT in the multilingual and single language pair setting.
    Page 2, “Introduction”
  7. The nonnegative weights ad reflect the importance of the different translation tasks and 2d ad 2 l. AL-SMT formulation for single language pair is a special case of this formulation where only one of the ad’s in the objective function (1) is one and the rest are zero.
    Page 2, “AL-SMT: Multilingual Setting”
  8. 2.1 for AL in the multilingual setting includes the single language pair setting as a special case (Haffari et al., 2009).
    Page 2, “AL-SMT: Multilingual Setting”
  9. For a single language pair we use U and L.
    Page 2, “AL-SMT: Multilingual Setting”
  10. Since each entry in lU+ has multiple translations, there are two options when building the auxiliary table for a particular language pair (Fd, E): (i) to use the corresponding translation ed of the source language in a self-training setting, or (ii) to use the consensus translation among all the translation candidates (e1, .., eD) in a co-training setting (sharing information between multiple SMT models).
    Page 2, “AL-SMT: Multilingual Setting”
  11. A whole range of methods exist in the literature for combining the output translations of multiple MT systems for a single language pair , operating either at the sentence, phrase, or word level (He et al., 2008; Rosti et al., 2007; Matusov et al., 2006).
    Page 2, “AL-SMT: Multilingual Setting”

See all papers in Proc. ACL 2009 that mention language pair.

See all papers in Proc. ACL that mention language pair.

Back to top.

MT systems

Appears in 18 sentences as: MT system (7) MT systems (13)
In Active Learning for Multilingual Statistical Machine Translation
  1. We introduce an active learning task of adding a new language to an existing multilingual set of parallel text and constructing high quality MT systems , from each language in the collection into this new target language.
    Page 1, “Abstract”
  2. In this paper, we consider how to use active learning (AL) in order to add a new language to such a multilingual parallel corpus and at the same time we construct an MT system from each language in the original corpus into this new target language.
    Page 1, “Introduction”
  3. In this paper, we explore how multiple MT systems can be used to effectively pick instances that are more likely to improve training quality.
    Page 1, “Introduction”
  4. When we build multiple MT systems from multiple source languages to the new target language, each MT system can be seen as a different ‘view’ on the desired output translation.
    Page 1, “Introduction”
  5. Thus, we can train our multiple MT systems using either self-training or co-training (Blum and Mitchell, 1998).
    Page 1, “Introduction”
  6. In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data.
    Page 1, “Introduction”
  7. In co-training each MT system is retrained using human labeled data plus noisy translation output from the other MT systems in the ensemble.
    Page 1, “Introduction”
  8. We use consensus translations (He et al., 2008; Rosti et al., 2007; Matusov et al., 2006) as an effective method for co-training between multiple MT systems .
    Page 1, “Introduction”
  9. MT, in which we build multiple MT systems and add a new language to an existing multilingual parallel corpus.
    Page 1, “Introduction”
  10. 0 We describe a novel co-training based active learning framework that exploits consensus translations to effectively select only those sentences that are difficult to translate for all MT systems , thus sharing annotation cost.
    Page 2, “Introduction”
  11. Our goal is to add a new language to this corpus, and at the same time to construct high quality MT systems from the existing languages (in the multilingual corpus) to the new language.
    Page 2, “AL-SMT: Multilingual Setting”

See all papers in Proc. ACL 2009 that mention MT systems.

See all papers in Proc. ACL that mention MT systems.

Back to top.

BLEU

Appears in 9 sentences as: BLEU (9)
In Active Learning for Multilingual Statistical Machine Translation
  1. The translation quality is measured by TQ for individual systems M Fd_, E; it can be BLEU score or WEM’ER (Word error rate and position independent WER) which induces a maximization or minimization problem, respectively.
    Page 2, “AL-SMT: Multilingual Setting”
  2. This process is continued iteratively until a certain level of translation quality is met (we use the BLEU score, WER and PER) (Papineni et al., 2002).
    Page 2, “AL-SMT: Multilingual Setting”
  3. 0 Let e0 be the consensus among all the candidate translations, then define the disagreement as Ed ad(1 — BLEU (eC, ed)).
    Page 3, “Sentence Selection: Multiple Language Pairs”
  4. The number of weights 2121- is 3 plus the number of source languages, and they are trained using minimum error-rate training (MERT) to maximize the BLEU score (Och, 2003) on a development set.
    Page 6, “Experiments”
  5. Avg BLEU Score
    Page 6, “Experiments”
  6. Avg BLEU Score
    Page 7, “Experiments”
  7. Avg BLEU Score
    Page 7, “Experiments”
  8. It shows that the co-training mode outperforms the self-training mode by almost 1 BLEU point.
    Page 7, “Experiments”
  9. Avg BLEU Score
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

labeled data

Appears in 9 sentences as: labeled data (9)
In Active Learning for Multilingual Statistical Machine Translation
  1. However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective.
    Page 1, “Introduction”
  2. In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data.
    Page 1, “Introduction”
  3. In co-training each MT system is retrained using human labeled data plus noisy translation output from the other MT systems in the ensemble.
    Page 1, “Introduction”
  4. When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively).
    Page 2, “AL-SMT: Multilingual Setting”
  5. The more frequent a phrase is in the labeled data , the more unimportant it is; since probably we have observed most of its translations.
    Page 4, “Sentence Selection: Single Language Pair”
  6. In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
    Page 4, “Sentence Selection: Single Language Pair”
  7. two multinomials, one for labeled data and the other one for unlabeled data.
    Page 4, “Sentence Selection: Single Language Pair”
  8. The first term is the log probability ratio of regular phrases under phrase models corresponding to unlabeled and labeled data , and the second term is the expected log probability ratio (ELPR) under the two models.
    Page 5, “Sentence Selection: Single Language Pair”
  9. We subsampled 5,000 sentences as the labeled data 11.. and 20,000 sentences as [U for the pool of untranslated sentences (while hiding the English part).
    Page 5, “Experiments”

See all papers in Proc. ACL 2009 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

unlabeled data

Appears in 9 sentences as: unlabeled data (9)
In Active Learning for Multilingual Statistical Machine Translation
  1. In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data .
    Page 1, “Introduction”
  2. (Ueffing et al., 2007; Haffari et al., 2009) show that treating [0+ as a source for a new feature function in a log-linear model for SMT (Och and Ney, 2004) allows us to maximally take advantage of unlabeled data by finding a weight for this feature using minimum error-rate training (MERT) (Och, 2003).
    Page 2, “AL-SMT: Multilingual Setting”
  3. Using this method, we rank the entries in unlabeled data U for each translation task defined by language pair (Fd, This results in several ranking lists, each of which represents the importance of entries with respect to a particular translation task.
    Page 3, “Sentence Selection: Multiple Language Pairs”
  4. The more frequent a phrase (not a phrase pair) is in the unlabeled data , the more important it is to
    Page 3, “Sentence Selection: Single Language Pair”
  5. know its translation; since it is more likely to see it in test data (specially when the test data is in-domain with respect to unlabeled data ).
    Page 4, “Sentence Selection: Single Language Pair”
  6. In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
    Page 4, “Sentence Selection: Single Language Pair”
  7. two multinomials, one for labeled data and the other one for unlabeled data .
    Page 4, “Sentence Selection: Single Language Pair”
  8. In the second model, we consider a mixture model of two multinomials responsible for generating phrases in each of the labeled and unlabeled data sets.
    Page 5, “Sentence Selection: Single Language Pair”
  9. (Reichart et al., 2008) introduces multitask active learning where unlabeled data require annotations for multiple tasks, e. g. they consider named-entities and parse trees, and showed that multiple tasks helps selection compared to individual tasks.
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

BLEU Score

Appears in 7 sentences as: BLEU Score (4) BLEU score (3)
In Active Learning for Multilingual Statistical Machine Translation
  1. The translation quality is measured by TQ for individual systems M Fd_, E; it can be BLEU score or WEM’ER (Word error rate and position independent WER) which induces a maximization or minimization problem, respectively.
    Page 2, “AL-SMT: Multilingual Setting”
  2. This process is continued iteratively until a certain level of translation quality is met (we use the BLEU score , WER and PER) (Papineni et al., 2002).
    Page 2, “AL-SMT: Multilingual Setting”
  3. The number of weights 2121- is 3 plus the number of source languages, and they are trained using minimum error-rate training (MERT) to maximize the BLEU score (Och, 2003) on a development set.
    Page 6, “Experiments”
  4. Avg BLEU Score
    Page 6, “Experiments”
  5. Avg BLEU Score
    Page 7, “Experiments”
  6. Avg BLEU Score
    Page 7, “Experiments”
  7. Avg BLEU Score
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention BLEU Score.

See all papers in Proc. ACL that mention BLEU Score.

Back to top.

parallel corpus

Appears in 7 sentences as: parallel corpus (7)
In Active Learning for Multilingual Statistical Machine Translation
  1. The main source of training data for statistical machine translation (SMT) models is a parallel corpus .
    Page 1, “Introduction”
  2. In many cases, the same information is available in multiple languages simultaneously as a multilingual parallel corpus , e. g., European Parliament (EuroParl) and UN.
    Page 1, “Introduction”
  3. In this paper, we consider how to use active learning (AL) in order to add a new language to such a multilingual parallel corpus and at the same time we construct an MT system from each language in the original corpus into this new target language.
    Page 1, “Introduction”
  4. MT, in which we build multiple MT systems and add a new language to an existing multilingual parallel corpus .
    Page 1, “Introduction”
  5. Consider a multilingual parallel corpus , such as EuroParl, which contains parallel sentences for several languages.
    Page 2, “AL-SMT: Multilingual Setting”
  6. We preprocessed the EuroParl corpus (http://wwwstatmt.org/europarl) (Koehn, 2005) and built a multilingual parallel corpus with 653,513 sentences, excluding the Q4/2000 portion of the data (2000-10 to 2000-12) which is reserved as the test set.
    Page 5, “Experiments”
  7. The test set consists of 2,000 multi-language sentences and comes from the multilingual parallel corpus built from Q4/2000 portion of the data.
    Page 5, “Experiments”

See all papers in Proc. ACL 2009 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

translation task

Appears in 7 sentences as: translation task (4) translation tasks (4)
In Active Learning for Multilingual Statistical Machine Translation
  1. In our case, the multiple tasks are individual machine translation tasks for several language pairs.
    Page 1, “Introduction”
  2. The nonnegative weights ad reflect the importance of the different translation tasks and 2d ad 2 l. AL-SMT formulation for single language pair is a special case of this formulation where only one of the ad’s in the objective function (1) is one and the rest are zero.
    Page 2, “AL-SMT: Multilingual Setting”
  3. Using this method, we rank the entries in unlabeled data U for each translation task defined by language pair (Fd, This results in several ranking lists, each of which represents the importance of entries with respect to a particular translation task .
    Page 3, “Sentence Selection: Multiple Language Pairs”
  4. is the ranking of a sentence in the list for the dth translation task (Reichart et al., 2008).
    Page 3, “Sentence Selection: Multiple Language Pairs”
  5. For the multilingual experiments (which involve four source languages) we set ad 2 .25 to make the importance of individual translation tasks equal.
    Page 6, “Experiments”
  6. Having noticed that Model 1 with ELPR performs well in the single language pair setting, we use it to rank entries for indiVidual translation tasks .
    Page 7, “Experiments”
  7. Figure 5: The log-log Zipf plots representing the true and estimated probabilities of a (source) phrase vs the rank of that phrase in the German to English translation task .
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention translation task.

See all papers in Proc. ACL that mention translation task.

Back to top.

machine translation

Appears in 5 sentences as: machine translation (4) Machine Translation* (1)
In Active Learning for Multilingual Statistical Machine Translation
  1. Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously.
    Page 1, “Abstract”
  2. The main source of training data for statistical machine translation (SMT) models is a parallel corpus.
    Page 1, “Introduction”
  3. In our case, the multiple tasks are individual machine translation tasks for several language pairs.
    Page 1, “Introduction”
  4. 11 Statistical Machine Translation*
    Page 1, “Introduction”
  5. For the single language pair setting, (Haffari et al., 2009) presents and compares several sentence selection methods for statistical phrase-based machine translation .
    Page 3, “Sentence Selection: Multiple Language Pairs”

See all papers in Proc. ACL 2009 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

segmentations

Appears in 5 sentences as: segmentations (6)
In Active Learning for Multilingual Statistical Machine Translation
  1. where Hx is the space of all possible segmentations for the 00V fragment X, Y)?
    Page 4, “Sentence Selection: Single Language Pair”
  2. We let Hx to be all possible segmentations of the fragment x for which the resulting phrase lengths are not greater than the maximum length constraint for phrase extraction in the underlying SMT model.
    Page 4, “Sentence Selection: Single Language Pair”
  3. Since we do not know anything about the segmentations a priori, we have put a uniform distribution over such segmentations .
    Page 4, “Sentence Selection: Single Language Pair”
  4. Let pk; (XM) be the number of possible segmentations from position i to position j of an OOV fragment x, and k is the maximum phrase length;
    Page 4, “Sentence Selection: Single Language Pair”
  5. We have used the fact that the number of occurrences of a phrase spanning the indices [i, j] is the product of the number of segmentations of the left and the right sub-fragments, which are
    Page 5, “Sentence Selection: Single Language Pair”

See all papers in Proc. ACL 2009 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.

translation quality

Appears in 5 sentences as: translation quality (5)
In Active Learning for Multilingual Statistical Machine Translation
  1. We introduce a novel combined measure of translation quality for multiple target language outputs (the same content from multiple source languages).
    Page 1, “Introduction”
  2. However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective.
    Page 1, “Introduction”
  3. ting allows new features for active learning which we exploit to improve translation quality while reducing annotation effort.
    Page 2, “Introduction”
  4. The translation quality is measured by TQ for individual systems M Fd_, E; it can be BLEU score or WEM’ER (Word error rate and position independent WER) which induces a maximization or minimization problem, respectively.
    Page 2, “AL-SMT: Multilingual Setting”
  5. This process is continued iteratively until a certain level of translation quality is met (we use the BLEU score, WER and PER) (Papineni et al., 2002).
    Page 2, “AL-SMT: Multilingual Setting”

See all papers in Proc. ACL 2009 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

phrase-based

Appears in 4 sentences as: phrase-based (4)
In Active Learning for Multilingual Statistical Machine Translation
  1. We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting.
    Page 1, “Abstract”
  2. 0 We introduce new highly effective sentence selection methods that improve phrase-based SMT in the multilingual and single language pair setting.
    Page 2, “Introduction”
  3. For the single language pair setting, (Haffari et al., 2009) presents and compares several sentence selection methods for statistical phrase-based machine translation.
    Page 3, “Sentence Selection: Multiple Language Pairs”
  4. Phrases are basic units of translation in phrase-based SMT models.
    Page 3, “Sentence Selection: Single Language Pair”

See all papers in Proc. ACL 2009 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

objective function

Appears in 3 sentences as: objective function (2) objective function: (1)
In Active Learning for Multilingual Statistical Machine Translation
  1. This goal is formalized by the following objective function:
    Page 2, “AL-SMT: Multilingual Setting”
  2. The nonnegative weights ad reflect the importance of the different translation tasks and 2d ad 2 l. AL-SMT formulation for single language pair is a special case of this formulation where only one of the ad’s in the objective function (1) is one and the rest are zero.
    Page 2, “AL-SMT: Multilingual Setting”
  3. The goal is to optimize the objective function (1) with minimum human effort in providing the translations.
    Page 3, “Sentence Selection: Multiple Language Pairs”

See all papers in Proc. ACL 2009 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

phrase table

Appears in 3 sentences as: phrase table (2) phrase tables (2)
In Active Learning for Multilingual Statistical Machine Translation
  1. When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively).
    Page 2, “AL-SMT: Multilingual Setting”
  2. Some of these fragments are the source language part of a phrase pair available in the phrase table , which we call regular phrases and denote their set by X £69 for a sentence 3.
    Page 4, “Sentence Selection: Single Language Pair”
  3. However, there are some fragments in the sentence which are not covered by the phrase table —possibly because of the OOVs (out-of-vocabulary words) or the constraints imposed by the phrase extraction algorithm — called X 800” for a sentence 5.
    Page 4, “Sentence Selection: Single Language Pair”

See all papers in Proc. ACL 2009 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

Statistical Machine Translation*

Appears in 3 sentences as: Statistical machine translation (1) statistical machine translation (1) Statistical Machine Translation* (1)
In Active Learning for Multilingual Statistical Machine Translation
  1. Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously.
    Page 1, “Abstract”
  2. The main source of training data for statistical machine translation (SMT) models is a parallel corpus.
    Page 1, “Introduction”
  3. 11 Statistical Machine Translation*
    Page 1, “Introduction”

See all papers in Proc. ACL 2009 that mention Statistical Machine Translation*.

See all papers in Proc. ACL that mention Statistical Machine Translation*.

Back to top.

translation probabilities

Appears in 3 sentences as: translation probabilities (3)
In Active Learning for Multilingual Statistical Machine Translation
  1. Additionally phrase translation probabilities need to be estimated accurately, which means sentences that offer phrases whose occurrences in the corpus were rare are informative.
    Page 3, “Sentence Selection: Single Language Pair”
  2. estimating accurately the phrase translation probabilities .
    Page 3, “Sentence Selection: Single Language Pair”
  3. Smoothing techniques partly handle accurate estimation of translation probabilities when the events occur rarely (indeed it is the main reason for smoothing).
    Page 3, “Sentence Selection: Single Language Pair”

See all papers in Proc. ACL 2009 that mention translation probabilities.

See all papers in Proc. ACL that mention translation probabilities.

Back to top.