Topic Models for Dynamic Translation Model Adaptation
Eidelman, Vladimir and Boyd-Graber, Jordan and Resnik, Philip

Article Structure

Abstract

We propose an approach that biases machine translation systems toward relevant translations based on topic-specific contexts, where topics are induced in an unsupervised way using topic models; this can be thought of as inducing subcorpora for adaptation without any human annotation.

Introduction

The performance of a statistical machine translation (SMT) system on a translation task depends largely on the suitability of the available parallel training data.

Model Description

Lexical Weighting Lexical weighting features estimate the quality of a phrase pair by combining the lexical translation probabilities of the words in the phrase2 (Koehn et al., 2003).

Experiments

Setup To evaluate our approach, we performed eXperiments on Chinese to English MT in two settings.

Discussion and Conclusion

Applying SMT to new domains requires techniques to inform our algorithms how best to adapt.

Topics

topic distribution

Appears in 14 sentences as: topic distribution (12) topic distributions (2)
In Topic Models for Dynamic Translation Model Adaptation
  1. We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorporate them into our translation model as features.
    Page 1, “Abstract”
  2. In our case, by building a topic distribution for the source side of the training data, we abstract the notion of domain to include automatically derived subcorpora with probabilistic membership.
    Page 2, “Introduction”
  3. This topic model infers the topic distribution of a test set and biases sentence translations to appropriate topics.
    Page 2, “Introduction”
  4. Given a parallel training corpus T composed of documents di, we build a source side topic model over T, which provides a topic distribution p(zn|d7;) for Zn 2 {1, .
    Page 2, “Model Description”
  5. Then, we assign p(zn|d7;) to be the topic distribution for every sentence 303- 6 di, thus enforcing topic sharing across sentence pairs in the same document instead of treating them as unrelated.
    Page 2, “Model Description”
  6. Computing the topic distribution over a document and assigning it to the sentences serves to tie the sentences together in the document context.
    Page 2, “Model Description”
  7. To obtain the lexical probability conditioned on topic distribution , we first compute the expected count ezn (e, f) of a word pair under topic Zn:
    Page 2, “Model Description”
  8. Thus, we will introduce 2-K new word translation tables, one for each pzn(e| f) and pzn(f|e), and as many new corresponding features fzn(é|f), fzn The actual feature values we compute will depend on the topic distribution of the document we are translating.
    Page 2, “Model Description”
  9. topic distribution of the sentence from which we are extracting the phrase.
    Page 3, “Model Description”
  10. with the topic distribution p(zn|V) of the test sentence being translated provides a probability on how relevant that translation table is to the sentence.
    Page 3, “Model Description”
  11. are learning how useful knowledge of the topic distribution is, i.e., f1 2: p<arg maXZn(p(Zn|V))(E|f) -Mars main(p(anV))lV)-
    Page 3, “Model Description”

See all papers in Proc. ACL 2012 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

topic model

Appears in 10 sentences as: topic model (4) Topic Modeling (1) Topic modeling (2) topic modeling (1) topic models (3)
In Topic Models for Dynamic Translation Model Adaptation
  1. We propose an approach that biases machine translation systems toward relevant translations based on topic-specific contexts, where topics are induced in an unsupervised way using topic models ; this can be thought of as inducing subcorpora for adaptation without any human annotation.
    Page 1, “Abstract”
  2. Topic modeling has received some use in SMT, for instance Bilingual LSA adaptation (Tam et al., 2007), and the BiTAM model (Zhao and Xing, 2006), which uses a bilingual topic model for learning alignment.
    Page 2, “Introduction”
  3. This topic model infers the topic distribution of a test set and biases sentence translations to appropriate topics.
    Page 2, “Introduction”
  4. Topic Modeling for MT We extend provenance to cover a set of automatically generated topics Zn.
    Page 2, “Model Description”
  5. Given a parallel training corpus T composed of documents di, we build a source side topic model over T, which provides a topic distribution p(zn|d7;) for Zn 2 {1, .
    Page 2, “Model Description”
  6. Since FBIS has document delineations, we compare local topic modeling (LTM) with modeling at the document level (GTM).
    Page 3, “Experiments”
  7. Topic modeling was performed with Mallet (Mccallum, 2002), a standard implementation of LDA, using a Chinese sto-plist and setting the per-document Dirichlet parameter a = 0.01.
    Page 4, “Experiments”
  8. Although the performance on BLEU for both the 20 topic models LTM-20 and GTM-20 is suboptimal, the TER improvement is better.
    Page 4, “Experiments”
  9. We can construct a topic model once on the training data, and use it infer topics on any test set to adapt the translation model.
    Page 4, “Discussion and Conclusion”
  10. Multilingual topic models (Boyd-Graber and Resnik, 2010) would provide a technique to use data from multiple languages to ensure consistent topics.
    Page 4, “Discussion and Conclusion”

See all papers in Proc. ACL 2012 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

BLEU

Appears in 7 sentences as: BLEU (7)
In Topic Models for Dynamic Translation Model Adaptation
  1. Conditioning lexical probabilities on the topic biases translations toward topic-relevant output, resulting in significant improvements of up to 1 BLEU and 3 TER on Chinese to English translation over a strong baseline.
    Page 1, “Abstract”
  2. Incorporating these features into our hierarchical phrase-based translation system significantly improved translation performance, by up to l BLEU and 3 TER over a strong Chinese to English baseline.
    Page 2, “Introduction”
  3. 2010) as our decoder, and tuned the parameters of the system to optimize BLEU (Papineni et al., 2002) on the NIST MT06 tuning corpus using the Margin Infused Relaxed Algorithm (MIRA) (Crammer et al., 2006; Eidelman, 2012).
    Page 4, “Experiments”
  4. On FBIS, we can see that both models achieve moderate but consistent gains over the baseline on both BLEU and TER.
    Page 4, “Experiments”
  5. The best model, LTM-10, achieves a gain of about 0.5 and 0.6 BLEU and 2 TER.
    Page 4, “Experiments”
  6. Although the performance on BLEU for both the 20 topic models LTM-20 and GTM-20 is suboptimal, the TER improvement is better.
    Page 4, “Experiments”
  7. On the NIST corpus, LTM-10 again achieves the best gain of approximately 1 BLEU and up to 3 TER.
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

NIST

Appears in 6 sentences as: NIST (6)
In Topic Models for Dynamic Translation Model Adaptation
  1. The second setting uses the non-UN and non-HK Hansards portions of the NIST training corpora with LTM only.
    Page 3, “Experiments”
  2. En Zh FBIS 269K 10.3M 7.9M NIST 1.6M 44.4M 40.4M
    Page 4, “Experiments”
  3. 2010) as our decoder, and tuned the parameters of the system to optimize BLEU (Papineni et al., 2002) on the NIST MT06 tuning corpus using the Margin Infused Relaxed Algorithm (MIRA) (Crammer et al., 2006; Eidelman, 2012).
    Page 4, “Experiments”
  4. On the NIST corpus, LTM-10 again achieves the best gain of approximately 1 BLEU and up to 3 TER.
    Page 4, “Experiments”
  5. LTM performs on par with or better than GTM, and provides significant gains even in the NIST data setting, showing that this method can be effectively applied directly on the sentence level to large training
    Page 4, “Experiments”
  6. Table 2: Performance using FBIS training corpus (top) and NIST corpus (bottom).
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

TER

Appears in 6 sentences as: TER (6)
In Topic Models for Dynamic Translation Model Adaptation
  1. Conditioning lexical probabilities on the topic biases translations toward topic-relevant output, resulting in significant improvements of up to 1 BLEU and 3 TER on Chinese to English translation over a strong baseline.
    Page 1, “Abstract”
  2. Incorporating these features into our hierarchical phrase-based translation system significantly improved translation performance, by up to l BLEU and 3 TER over a strong Chinese to English baseline.
    Page 2, “Introduction”
  3. On FBIS, we can see that both models achieve moderate but consistent gains over the baseline on both BLEU and TER .
    Page 4, “Experiments”
  4. The best model, LTM-10, achieves a gain of about 0.5 and 0.6 BLEU and 2 TER .
    Page 4, “Experiments”
  5. Although the performance on BLEU for both the 20 topic models LTM-20 and GTM-20 is suboptimal, the TER improvement is better.
    Page 4, “Experiments”
  6. On the NIST corpus, LTM-10 again achieves the best gain of approximately 1 BLEU and up to 3 TER .
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

translation model

Appears in 5 sentences as: translation model (5)
In Topic Models for Dynamic Translation Model Adaptation
  1. We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorporate them into our translation model as features.
    Page 1, “Abstract”
  2. This problem has led to a substantial amount of recent work in trying to bias, or adapt, the translation model (TM) toward particular domains of interest (Axelrod et al., 2011; Foster et al., 2010; Snover et al., 2008).1 The intuition behind TM adaptation is to increase the likelihood of selecting relevant phrases for translation.
    Page 1, “Introduction”
  3. We induce unsupervised domains from large corpora, and we incorporate soft, probabilistic domain membership into a translation model .
    Page 2, “Introduction”
  4. We accomplish this by introducing topic dependent lexical probabilities directly as features in the translation model , and interpolating them log-linearly with our other features, thus allowing us to discriminatively optimize their weights on an arbitrary objective function.
    Page 2, “Introduction”
  5. We can construct a topic model once on the training data, and use it infer topics on any test set to adapt the translation model .
    Page 4, “Discussion and Conclusion”

See all papers in Proc. ACL 2012 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

word pair

Appears in 5 sentences as: word pair (4) word pairs (2)
In Topic Models for Dynamic Translation Model Adaptation
  1. First, since a sentence contributes its counts only to the translation table for the source it came from, many word pairs will be unobserved for a given table.
    Page 1, “Introduction”
  2. (2011) showed that is it beneficial to condition the lexical weighting features on provenance by assigning each sentence pair a set of features, fs(é|7), one for each domain 8, which compute a new word translation table p3(e| f) estimated from only those sentences which belong to s: 03(f, e)/Ze 03(f, e), where cs(-) is the number of occurrences of the word pair in 3.
    Page 2, “Model Description”
  3. To obtain the lexical probability conditioned on topic distribution, we first compute the expected count ezn (e, f) of a word pair under topic Zn:
    Page 2, “Model Description”
  4. where cj(-) denotes the number of occurrences of the word pair in sentence 303-, and then compute:
    Page 2, “Model Description”
  5. (2011) has to explicitly smooth the resulting p3 (6| f), since many word pairs will be unseen for a given domain 8, we are already performing an implicit form of smoothing (when computing the expected counts), since each document has a distribution over all topics, and therefore we have some probability of observing each word pair in every topic.
    Page 3, “Model Description”

See all papers in Proc. ACL 2012 that mention word pair.

See all papers in Proc. ACL that mention word pair.

Back to top.

LDA

Appears in 3 sentences as: LDA (3)
In Topic Models for Dynamic Translation Model Adaptation
  1. , K} over each document, using Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003).
    Page 2, “Model Description”
  2. For this case, we also propose a local LDA model (LTM), which treats each sentence as a separate document.
    Page 3, “Model Description”
  3. Topic modeling was performed with Mallet (Mccallum, 2002), a standard implementation of LDA , using a Chinese sto-plist and setting the per-document Dirichlet parameter a = 0.01.
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

phrase pair

Appears in 3 sentences as: Phrase pair (1) phrase pair (3)
In Topic Models for Dynamic Translation Model Adaptation
  1. Lexical Weighting Lexical weighting features estimate the quality of a phrase pair by combining the lexical translation probabilities of the words in the phrase2 (Koehn et al., 2003).
    Page 2, “Model Description”
  2. Phrase pair probabilities p(€|7) are computed from these as described in Koehn et al.
    Page 2, “Model Description”
  3. For example, if topic 1:: is dominant in T, pk(é|7) may be quite large, but if p(k:|V) is very small, then we should steer away from this phrase pair and select a competing phrase pair which may have a lower probability in T, but which is more relevant to the test sentence at hand.
    Page 3, “Model Description”

See all papers in Proc. ACL 2012 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.