Polylingual Tree-Based Topic Models for Translation Domain Adaptation

Topic models, an unsupervised technique for inferring translation domains improve machine translation quality.

Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA), are one of the most popular statistical frameworks for navigating large unannotated document collections.

Before considering past approaches using topic models to improve SMT, we briefly review lexical weighting and domain adaptation for SMT.

In this section, we bring existing tree-based topic models (Boyd-Graber et al., 2007, tLDA) and polylingual topic models (Mimno et al., 2009, pLDA) together and create the polylingual tree-based topic model (ptLDA) that incorporates both word-level correlations and document-level alignment information.

Inference of probabilistic models discovers the posterior distribution over latent variables.

We evaluate our new topic model, ptLDA, and existing topic models—LDA, pLDA, and tLDA—on their ability to induce domains for machine translation and the resulting performance of the translations on standard machine translation metrics.

In this section, we qualitatively analyze the translation results and investigate how ptLDA and its cousins improve SMT.

Topic models generate great interest, but their use in “real world” applications still lags; this is particularly true for multilingual topic models.

Appears in 47 sentences as: topic model (8) Topic Models (5) Topic models (8) topic models (40)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Topic models , an unsupervised technique for inferring translation domains improve machine translation quality.Page 1, “Abstract”
- We propose new polylingual tree-based topic models to extract domain knowledge that considers both source and target languages and derive three different inference schemes.Page 1, “Abstract”
- Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA), are one of the most popular statistical frameworks for navigating large unannotated document collections.Page 1, “Introduction”
- Topic models discover—without any supervision—the primary themes presented in a dataset: the namesake topics.Page 1, “Introduction”
- Topic models have two primary applications: to aid human exploration of corpora (Chang et al., 2009) or serve as a low-dimensional representation for downstream applications.Page 1, “Introduction”
- In particular, we use topic models to aid statistical machine translation (Koehn, 2009, SMT).Page 1, “Introduction”
- As we review in Section 2, topic models are a promising solution for automatically discovering domains in machine translation corpora.Page 1, “Introduction”
- In contrast, machine translation uses inherently multilingual data: an SMT system must translate a phrase or sentence from a source language to a different target language, so existing applications of topic models (Eidelman et al., 2012) are wilfully ignoring available information on the target side that could aid domain discovery.Page 1, “Introduction”
- This is not for a lack of multilingual topic models .Page 1, “Introduction”
- Topic models bridge the chasm between languages using document connections (Mimno et al., 2009), dictionaries (Boyd-Graber and Resnik, 2010), and word alignments (Zhao and Xing, 2006).Page 1, “Introduction”
- In Section 3, we create a model—the polylingual tree-based topic models (ptLDA)—that uses information from both external dictionaries and document alignments simultaneously.Page 1, “Introduction”

See all papers in *Proc. ACL 2014* that mention topic models.

See all papers in *Proc. ACL* that mention topic models.

Back to top.

Appears in 20 sentences as: Machine Translation (1) machine translation (22)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Topic models, an unsupervised technique for inferring translation domains improve machine translation quality.Page 1, “Abstract”
- In particular, we use topic models to aid statistical machine translation (Koehn, 2009, SMT).Page 1, “Introduction”
- Modern machine translation systems use millions of examples of translations to learn translation rules.Page 1, “Introduction”
- As we review in Section 2, topic models are a promising solution for automatically discovering domains in machine translation corpora.Page 1, “Introduction”
- In contrast, machine translation uses inherently multilingual data: an SMT system must translate a phrase or sentence from a source language to a different target language, so existing applications of topic models (Eidelman et al., 2012) are wilfully ignoring available information on the target side that could aid domain discovery.Page 1, “Introduction”
- We show that ptLDA offers better domain adaptation than other topic models for machine translation .Page 1, “Introduction”
- 2.1 Statistical Machine TranslationPage 2, “Topic Models for Machine Translation”
- Statistical machine translation casts machine translation as a probabilistic process (Koehn, 2009).Page 2, “Topic Models for Machine Translation”
- (2012) ignore a wealth of information that could improve topic models and help machine translation .Page 3, “Topic Models for Machine Translation”
- We compare these models’ machine translation performance in Section 5.Page 4, “Polylingual Tree-based Topic Models”
- We evaluate our new topic model, ptLDA, and existing topic models—LDA, pLDA, and tLDA—on their ability to induce domains for machine translation and the resulting performance of the translations on standard machine translation metrics.Page 6, “Experiments”

See all papers in *Proc. ACL 2014* that mention machine translation.

See all papers in *Proc. ACL* that mention machine translation.

Back to top.

Appears in 18 sentences as: ld+a (1) LDA (16) LDA’s (1)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA ), are one of the most popular statistical frameworks for navigating large unannotated document collections.Page 1, “Introduction”
- While vanilla topic models ( LDA ) can only be applied to monolingual data, there are a number of topic models for parallel corpora: Zhao and Xing (2006) assume aligned word pairs share same topics; Mimno et al.Page 3, “Topic Models for Machine Translation”
- Generative Process As in LDA , each word token is associated with a topic.Page 4, “Polylingual Tree-based Topic Models”
- With these correlated in topics in hand, the generation of documents are very similar to LDA .Page 4, “Polylingual Tree-based Topic Models”
- p(zdn : k7 ydn : Slfizdna fiydn7 057/6) _ Nk|d+0z 0C ll — wdn] Zk,(Nk, ld+a ) OH Ni—>j|k+fli—>jPage 5, “Inference”
- Topic Models Configuration We compare our polylingual tree-based topic model (ptLDA) against tree-based topic models (tLDA), polylingual topic models (pLDA) and vanilla topic models ( LDA ).3 We also examine different inference algorithms—Gibbs sampling (gibbs), variational inference (variational) and hybrid approach (variational-hybrid)—on the effects of SMT performance.Page 6, “Experiments”
- We refer to the SMT model without domain adaptation as baseline.5 LDA marginally improves machine translation (less than half a BLEU point).Page 6, “Experiments”
- 3For Gibbs sampling, we use implementations available in Hu and Boyd-Graber (2012) for tLDA; and Mallet (McCallum, 2002) for LDA and pLDA.Page 6, “Experiments”
- Polylingual topic models pLDA and tree-based topic models tLDA-dict are consistently better than LDA , suggesting that incorporating additional bilingual knowledge improves topic models.Page 7, “Experiments”
- While ptLDA-align performs better than baseline SMT and LDA , it is worse than ptLDA-dict, possibly because of errors in the word alignments, making the tree priors less effective.Page 7, “Experiments”
- The first example shows both LDA and ptLDA improve the baseline.Page 7, “Discussion”

See all papers in *Proc. ACL 2014* that mention LDA.

See all papers in *Proc. ACL* that mention LDA.

Back to top.

Appears in 10 sentences as: Domain Adaptation (2) Domain adaptation (1) domain adaptation (9)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Systems that are robust to systematic variation in the training set are said to exhibit domain adaptation .Page 1, “Introduction”
- We show that ptLDA offers better domain adaptation than other topic models for machine translation.Page 1, “Introduction”
- Before considering past approaches using topic models to improve SMT, we briefly review lexical weighting and domain adaptation for SMT.Page 1, “Topic Models for Machine Translation”
- Domain Adaptation for SMT Training a SMT system using diverse data requires domain adaptation .Page 2, “Topic Models for Machine Translation”
- This obviates the explicit smoothing used in other domain adaptation systems (Chiang et al., 2011).Page 2, “Topic Models for Machine Translation”
- Domain Adaptation using Topic Models We examine the effectiveness of using topic models for domain adaptation on standard SMT evaluation metrics—BLEU (Papineni et al., 2002) and TER (Snover et al., 2006).Page 6, “Experiments”
- We refer to the SMT model without domain adaptation as baseline.5 LDA marginally improves machine translation (less than half a BLEU point).Page 6, “Experiments”
- We also discuss other approaches to improve unsupervised domain adaptation for SMT.Page 7, “Discussion”
- To our knowledge, however, this is the first work to use multilingual topic models for domain adaptation in machine translation.Page 8, “Discussion”
- Domain adaptation for language models (Bellegarda, 2004; Wood and Teh, 2009) is an important avenue for improving machine translation.Page 8, “Discussion”

See all papers in *Proc. ACL 2014* that mention domain adaptation.

See all papers in *Proc. ACL* that mention domain adaptation.

Back to top.

Appears in 7 sentences as: word pair (1) word pairs (7)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- The phrase pair probabilities pw (6| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003).Page 2, “Topic Models for Machine Translation”
- where cd(o) is the number of occurrences of the word pair in document d. The lexical probability conditioned on topic k is the unsmoothed probability estimate of those expected countsPage 2, “Topic Models for Machine Translation”
- While vanilla topic models (LDA) can only be applied to monolingual data, there are a number of topic models for parallel corpora: Zhao and Xing (2006) assume aligned word pairs share same topics; Mimno et al.Page 3, “Topic Models for Machine Translation”
- Figure 1: An example of constructing a prior tree from a bilingual dictionary: word pairs with the same meaning but in different languages are concepts; we create a common parent node to group words in a concept, and then connect to the root; un-correlated words are connected to the root directly.Page 4, “Polylingual Tree-based Topic Models”
- The word pairs define concepts for the prior tree (align).Page 4, “Polylingual Tree-based Topic Models”
- The prior tree has about 1000 word pairs (dict).Page 6, “Experiments”
- We then remove the word pairs appearing more than 50K times or fewer than 500 times and construct a second prior tree with about 2500 word pairs (align).Page 6, “Experiments”

See all papers in *Proc. ACL 2014* that mention word pairs.

See all papers in *Proc. ACL* that mention word pairs.

Back to top.

Appears in 5 sentences as: language model (3) Language Models (1) language models (1)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- We train a modified Kneser—Ney trigram language model on English (Chen and Goodman, 1996).Page 6, “Experiments”
- 6.3 Improving Language ModelsPage 8, “Discussion”
- Topic models capture document-level properties of language, but a critical component of machine translation systems is the language model , which provides local constraints and preferences.Page 8, “Discussion”
- Domain adaptation for language models (Bellegarda, 2004; Wood and Teh, 2009) is an important avenue for improving machine translation.Page 8, “Discussion”
- Further improvement is possible by incorporating topic models deeper in the decoding process and adding domain knowledge to the language model .Page 9, “Conclusion”

See all papers in *Proc. ACL 2014* that mention language model.

See all papers in *Proc. ACL* that mention language model.

Back to top.

Appears in 5 sentences as: latent variables (7)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Inference of probabilistic models discovers the posterior distribution over latent variables .Page 4, “Inference”
- For a collection of D documents, each of which contains Nd number of words, the latent variables of ptLDA are: transition distributions 71',“- for every topic k and internal node i in the prior tree structure; multinomial distributions over topics 6d for every document d; topic assignments zdn and path ydn for the nth word wdn in document d. The joint distribution of polylingual tree-based topic models isPage 4, “Inference”
- proximate posterior inference to discover the latent variables that best explain our data.Page 5, “Inference”
- In practice, we sample the latent variables using efficient sparse updates (Yao et al., 2009; Hu and Boyd-Graber, 2012).Page 5, “Inference”
- Variational Bayesian inference approximates the posterior distribution with a simplified variational distribution q over the latent variables: document topic proportions 6, transition probabilities 7r, topic assignments 25, and path assignments y. Variational distributions typically assume a mean-field distribution over these latent variables, removing all dependencies between the latent variables .Page 5, “Inference”

See all papers in *Proc. ACL 2014* that mention latent variables.

See all papers in *Proc. ACL* that mention latent variables.

Back to top.

Appears in 5 sentences as: NIST (5)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Dataset and SMT Pipeline We use the NIST MT Chinese-English parallel corpus (NIS T), excluding non-UN and non-HK Hansards portions as our training dataset.Page 6, “Experiments”
- To optimize SMT system, we tune the parameters on NIST MT06, and report results on three test sets: MT02, MT03 and MT05.2Page 6, “Experiments”
- Resources for Prior Tree To build the tree for tLDA and ptLDA, we extract the word correlations from a Chinese-English bilingual dictionary (Denisowski, 1997).4 We filter the dictionary using the NIST vocabulary, and keep entries mapping single Chinese and single English words.Page 6, “Experiments”
- 2The NIST datasets contain 878, 919, 1082 and 1664 sentences for MT02, MT03, MT05 and MT06 respectively.Page 6, “Experiments”
- With 1.6M NISTPage 7, “Experiments”

See all papers in *Proc. ACL 2014* that mention NIST.

See all papers in *Proc. ACL* that mention NIST.

Back to top.

Appears in 5 sentences as: BLEU (5)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- We evaluate our model on a Chinese to English translation task and obtain up to 1.2 BLEU improvement over strong baselines.Page 1, “Abstract”
- We refer to the SMT model without domain adaptation as baseline.5 LDA marginally improves machine translation (less than half a BLEU point).Page 6, “Experiments”
- These improvements are not redundant: our new ptLDA-dict model, which has aspects of both models yields the best performance among these approaches—up to a 1.2 BLEU point gain (higher is better), and -2.6 TER improvement (lower is better).Page 7, “Experiments”
- The BLEU improvement is significant (Koehn, 2004) at p = 0.01,6 except on MT03 with variational and variational-hybrid inference.Page 7, “Experiments”
- 6Because we have multiple runs of each topic model (and thus different translation models), we select the run closest to the average BLEU for the translation significance test.Page 7, “Experiments”

See all papers in *Proc. ACL 2014* that mention BLEU.

See all papers in *Proc. ACL* that mention BLEU.

Back to top.

Appears in 5 sentences as: topic distribution (6)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- If each topic defines a SMT domain, the document’s topic distribution is a soft domain assignment for that document.Page 2, “Topic Models for Machine Translation”
- The topics come from source documents only and create topic-specific lexical weights from the per-document topic distribution p(l<: | d).Page 2, “Topic Models for Machine Translation”
- For a test document d, the document topic distribution p(l<: | d) is inferred based on the topics learned from training data.Page 2, “Topic Models for Machine Translation”
- a combination of the topic dependent lexical weight and the topic distribution of the document, from which we extract the phrase.Page 2, “Topic Models for Machine Translation”
- Polylingual topic models (Mimno et al., 2009) assume that the aligned documents in different languages share the same topic distribution and each language has a unique topic distribution over its word types.Page 3, “Polylingual Tree-based Topic Models”

See all papers in *Proc. ACL 2014* that mention topic distribution.

See all papers in *Proc. ACL* that mention topic distribution.

Back to top.

Appears in 4 sentences as: phrase pair (5)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Lexical Weighting In phrase-based SMT, lexical weighting features estimate the phrase pair quality by combining lexical translation probabilities of words in a phrase (Koehn et al., 2003).Page 2, “Topic Models for Machine Translation”
- The phrase pair probabilities pw (6| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003).Page 2, “Topic Models for Machine Translation”
- from which we can compute the phrase pair probabilities pw (6| f; k) by multiplying the lexical probabilities and normalizing as in Koehn et a1.Page 2, “Topic Models for Machine Translation”
- phrase pair (6, f) isPage 2, “Topic Models for Machine Translation”

See all papers in *Proc. ACL 2014* that mention phrase pair.

See all papers in *Proc. ACL* that mention phrase pair.

Back to top.

Appears in 4 sentences as: word alignments (4)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- Topic models bridge the chasm between languages using document connections (Mimno et al., 2009), dictionaries (Boyd-Graber and Resnik, 2010), and word alignments (Zhao and Xing, 2006).Page 1, “Introduction”
- In addition, we extract the word alignments from aligned sentences in a parallel corpus.Page 4, “Polylingual Tree-based Topic Models”
- We also extract the bidirectional word alignments between Chinese and English using GIZA++ (Och and Ney, 2003).Page 6, “Experiments”
- While ptLDA-align performs better than baseline SMT and LDA, it is worse than ptLDA-dict, possibly because of errors in the word alignments , making the tree priors less effective.Page 7, “Experiments”

See all papers in *Proc. ACL 2014* that mention word alignments.

See all papers in *Proc. ACL* that mention word alignments.

Back to top.

Appears in 4 sentences as: SMT system (4)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- In contrast, machine translation uses inherently multilingual data: an SMT system must translate a phrase or sentence from a source language to a different target language, so existing applications of topic models (Eidelman et al., 2012) are wilfully ignoring available information on the target side that could aid domain discovery.Page 1, “Introduction”
- Cross-Domain SMT A SMT system is usually trained on documents with the same genre (e.g., sports, business) from a similar style (e.g., newswire, blog-posts).Page 2, “Topic Models for Machine Translation”
- Domain Adaptation for SMT Training a SMT system using diverse data requires domain adaptation.Page 2, “Topic Models for Machine Translation”
- To optimize SMT system , we tune the parameters on NIST MT06, and report results on three test sets: MT02, MT03 and MT05.2Page 6, “Experiments”

See all papers in *Proc. ACL 2014* that mention SMT system.

See all papers in *Proc. ACL* that mention SMT system.

Back to top.

Appears in 4 sentences as: parallel corpus (4)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- For a parallel corpus of aligned source and target sentences (.73, 5 a phrase f E .7: is translated to a phrase 6 E 5 according to a distribution pw(é| f One popular method to estimate the probabilityPage 2, “Topic Models for Machine Translation”
- Our contribution are topics that capture multilingual information and thus better capture the domains in the parallel corpus .Page 3, “Topic Models for Machine Translation”
- In addition, we extract the word alignments from aligned sentences in a parallel corpus .Page 4, “Polylingual Tree-based Topic Models”
- Dataset and SMT Pipeline We use the NIST MT Chinese-English parallel corpus (NIS T), excluding non-UN and non-HK Hansards portions as our training dataset.Page 6, “Experiments”

See all papers in *Proc. ACL 2014* that mention parallel corpus.

See all papers in *Proc. ACL* that mention parallel corpus.

Back to top.

Appears in 4 sentences as: Gibbs sampler (2) Gibbs sampling (2)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- We use a collapsed Gibbs sampler for tree-based topic models to sample the path ydn and topic assignment zdn for word wdn,Page 5, “Inference”
- For topic .2 and path y, instead of variational updates, we use a Gibbs sampler within a document.Page 5, “Inference”
- This equation embodies how this is a hybrid algorithm: the first term resembles the Gibbs sampling term encoding how much a document prefers a topic, while the second term encodes the expectation under the variational distribution of how much a path is preferred by this topic,Page 6, “Inference”
- 3For Gibbs sampling , we use implementations available in Hu and Boyd-Graber (2012) for tLDA; and Mallet (McCallum, 2002) for LDA and pLDA.Page 6, “Experiments”

See all papers in *Proc. ACL 2014* that mention Gibbs sampler.

See all papers in *Proc. ACL* that mention Gibbs sampler.

Back to top.

Appears in 3 sentences as: Statistical Machine Translation (1) Statistical machine translation (1) statistical machine translation (1)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- In particular, we use topic models to aid statistical machine translation (Koehn, 2009, SMT).Page 1, “Introduction”
- 2.1 Statistical Machine TranslationPage 2, “Topic Models for Machine Translation”
- Statistical machine translation casts machine translation as a probabilistic process (Koehn, 2009).Page 2, “Topic Models for Machine Translation”

See all papers in *Proc. ACL 2014* that mention Statistical Machine Translation.

See all papers in *Proc. ACL* that mention Statistical Machine Translation.

Back to top.

Appears in 3 sentences as: translation task (3)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- We evaluate our model on a Chinese to English translation task and obtain up to 1.2 BLEU improvement over strong baselines.Page 1, “Abstract”
- We explore multiple inference schemes because while all of these methods optimize likelihood because they might give different results on the translation task .Page 5, “Inference”
- This paper contributes to the deeper integration of topic models into critical applications by presenting a new multilingual topic model, ptLDA, comparing it with other multilingual topic models on a machine translation task , and showing that these topic models improve machine translation.Page 9, “Conclusion”

See all papers in *Proc. ACL 2014* that mention translation task.

See all papers in *Proc. ACL* that mention translation task.

Back to top.

Appears in 3 sentences as: Word-level (1) word-level (2)

In *Polylingual Tree-Based Topic Models for Translation Domain Adaptation*

- In this section, we bring existing tree-based topic models (Boyd-Graber et al., 2007, tLDA) and polylingual topic models (Mimno et al., 2009, pLDA) together and create the polylingual tree-based topic model (ptLDA) that incorporates both word-level correlations and document-level alignment information.Page 3, “Polylingual Tree-based Topic Models”
- Word-level Correlations Tree-based topic models incorporate the correlations between words byPage 3, “Polylingual Tree-based Topic Models”
- Build Prior Tree Structures One remaining question is the source of the word-level connections across languages for the tree prior.Page 4, “Polylingual Tree-based Topic Models”

See all papers in *Proc. ACL 2014* that mention word-level.

See all papers in *Proc. ACL* that mention word-level.

Back to top.