A Topic Similarity Model for Hierarchical Phrase-based Translation
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun

Article Structure

Abstract

Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level.

Introduction

Topic model (Hofmann, 1999; Blei et al., 2003) is a popular technique for discovering the underlying topic structure of documents.

Background: Topic Model

A topic model is used for discovering the topics that occur in a collection of documents.

Topic Similarity Model

Sentences should be translated in consistence with their topics (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007).

Estimation

Unlike document-topic distribution that can be directly learned by LDA tools, we need to estimate the rule-topic distribution according to our requirement.

Decoding

We incorporate our topic similarity model as a new feature into a traditional hiero system (Chiang, 2007) under discriminative framework (Och and Ney, 2002).

Experiments

We try to answer the following questions by experiments:

Related Work

In addition to the topic-specific lexicon translation method mentioned in the previous sections, researchers also explore topic model for machine translation in other ways.

Conclusion and Future Work

We have presented a topic similarity model which incorporates the rule-topic distributions on both the source and target side into traditional hierarchical phrase-based system.

Topics

topic model

Appears in 32 sentences as: Topic model (1) topic model (27) topic models (8)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level.
    Page 1, “Abstract”
  2. Topic model (Hofmann, 1999; Blei et al., 2003) is a popular technique for discovering the underlying topic structure of documents.
    Page 1, “Introduction”
  3. Since a synchronous rule is rarely factorized into individual words, we believe that it is more reasonable to incorporate the topic model directly at the rule level rather than the word level.
    Page 1, “Introduction”
  4. 0 We estimate the topic distribution for a rule based on both the source and target side topic models (Section 4.1).
    Page 1, “Introduction”
  5. the target-side topic distributions of rules into the space of source-side topic model by one-to-many projection (Section 4.2).
    Page 2, “Introduction”
  6. A topic model is used for discovering the topics that occur in a collection of documents.
    Page 2, “Background: Topic Model”
  7. Both Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and Probabilistic Latent Semantic Analysis (PLSA) (Hofmann, 1999) are types of topic models .
    Page 2, “Background: Topic Model”
  8. LDA is the most common topic model currently in use, therefore we exploit it for mining topics in this paper.
    Page 2, “Background: Topic Model”
  9. Hellinger function is used to calculate distribution distance and is popular in topic model (Blei and Laf-ferty, 2007).1 By topic similarity, we aim to encourage or penalize the application of a rule for a given document according to their topic distributions, which then helps the SMT system make better translation decisions.
    Page 3, “Topic Similarity Model”
  10. To achieve this goal, we use both source-side and target-side monolingual topic models, and learn the correspondence between the two topic models from word-aligned bilingual corpus.
    Page 4, “Estimation”
  11. These two rule-topic distributions are estimated by corresponding topic models in the same way (Section 4.1).
    Page 4, “Estimation”

See all papers in Proc. ACL 2012 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

topic distribution

Appears in 30 sentences as: Topic Distribution (2) topic distribution (19) topic distributions (14)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents.
    Page 1, “Abstract”
  2. Consequently, we propose a topic similarity model for hierarchical phrase-based translation (Chiang, 2007), where each synchronous rule is associated with a topic distribution .
    Page 1, “Introduction”
  3. 0 Given a document to be translated, we calculate the topic similarity between a rule and the document based on their topic distributions .
    Page 1, “Introduction”
  4. 0 We estimate the topic distribution for a rule based on both the source and target side topic models (Section 4.1).
    Page 1, “Introduction”
  5. In order to calculate similarities between target-side topic distributions of rules and source-side topic distributions of given documents during decoding, we project
    Page 1, “Introduction”
  6. the target-side topic distributions of rules into the space of source-side topic model by one-to-many projection (Section 4.2).
    Page 2, “Introduction”
  7. We further show that both the source-side and target-side topic distributions improve translation quality and their improvements are complementary to each other.
    Page 2, “Introduction”
  8. The first one relates to the document-topic distribution, which records the topic distribution of each document.
    Page 2, “Background: Topic Model”
  9. Therefore, we use topic distribution to describe the relatedness of rules to topics.
    Page 2, “Topic Similarity Model”
  10. Figure 1 shows four synchronous rules (Chiang, 2007) with topic distributions , some of which contain nonterminals.
    Page 2, “Topic Similarity Model”
  11. We can see that, although the source part of rule (b) and (c) are identical, their topic distributions are quite different.
    Page 2, “Topic Similarity Model”

See all papers in Proc. ACL 2012 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

LDA

Appears in 12 sentences as: LDA (13)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. Both Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) and Probabilistic Latent Semantic Analysis (PLSA) (Hofmann, 1999) are types of topic models.
    Page 2, “Background: Topic Model”
  2. LDA is the most common topic model currently in use, therefore we exploit it for mining topics in this paper.
    Page 2, “Background: Topic Model”
  3. Here, we first give a brief description of LDA .
    Page 2, “Background: Topic Model”
  4. LDA views each document as a mixture proportion of various topics, and generates each word by multinomial distribution conditioned on a topic.
    Page 2, “Background: Topic Model”
  5. More specifically, as a generative process, LDA first samples a document-topic distribution for each document.
    Page 2, “Background: Topic Model”
  6. Generally speaking, LDA contains two types of parameters.
    Page 2, “Background: Topic Model”
  7. Based on these parameters (and some hyper-parameters), LDA can infer a topic assignment for each word in the documents.
    Page 2, “Background: Topic Model”
  8. The k-th dimension P(z = k3|d) means the probability of topic k: given document d. Different from rule-topic distribution, the document-topic distribution can be directly inferred by an off-the-shelf LDA tool.
    Page 3, “Topic Similarity Model”
  9. Unlike document-topic distribution that can be directly learned by LDA tools, we need to estimate the rule-topic distribution according to our requirement.
    Page 3, “Estimation”
  10. bution of every documents inferred by LDA tool.
    Page 4, “Estimation”
  11. The topic assignments are output by LDA tool.
    Page 4, “Estimation”

See all papers in Proc. ACL 2012 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

BLEU

Appears in 6 sentences as: BLEU (6)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. Experiments on Chinese-English translation tasks (Section 6) show that, our method outperforms the baseline hierarchial phrase-based system by +0.9 BLEU points.
    Page 2, “Introduction”
  2. Is our topic similarity model able to improve translation quality in terms of BLEU ?
    Page 5, “Experiments”
  3. Case-insensitive NIST BLEU (Papineni et al., 2002) was used to mea-
    Page 6, “Experiments”
  4. By using all the features (last line in the table), we improve the translation performance over the baseline system by 0.87 BLEU point on average.
    Page 7, “Experiments”
  5. We clearly find that the two rule-topic distributions improve the performance by 0.48 and 0.38 BLEU points over the baseline respectively.
    Page 7, “Experiments”
  6. Our topic similarity method on monotone rule achieves the most improvement which is 0.6 BLEU points, while the improvement on reordering rules is the smallest among the three types.
    Page 8, “Experiments”

See all papers in Proc. ACL 2012 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

phrase-based

Appears in 6 sentences as: phrase-based (6)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation.
    Page 1, “Abstract”
  2. Consequently, we propose a topic similarity model for hierarchical phrase-based translation (Chiang, 2007), where each synchronous rule is associated with a topic distribution.
    Page 1, “Introduction”
  3. We augment the hierarchical phrase-based system by integrating the proposed topic similarity model as a new feature (Section 3.1).
    Page 1, “Introduction”
  4. Experiments on Chinese-English translation tasks (Section 6) show that, our method outperforms the baseline hierarchial phrase-based system by +0.9 BLEU points.
    Page 2, “Introduction”
  5. We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations.
    Page 8, “Experiments”
  6. We have presented a topic similarity model which incorporates the rule-topic distributions on both the source and target side into traditional hierarchical phrase-based system.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

translation model

Appears in 6 sentences as: translation model (4) translation models (3)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality.
    Page 1, “Introduction”
  2. Topic-specific lexicon translation models focus on word-level translations.
    Page 1, “Introduction”
  3. In the topic-specific lexicon translation model , given a source document, it first calculates the topic-specific translation probability by normalizing the entire lexicon translation table, and then adapts the lexical weights of rules correspondingly.
    Page 5, “Decoding”
  4. The adapted lexicon translation model is added as a new feature under the discriminative framework.
    Page 7, “Experiments”
  5. combine a specific domain translation model with a general domain translation model depending on various text distances.
    Page 8, “Related Work”
  6. Finally, we hope to apply our method to other translation models , especially syntax-based models.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

BLEU points

Appears in 4 sentences as: BLEU point (1) BLEU points (3)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. Experiments on Chinese-English translation tasks (Section 6) show that, our method outperforms the baseline hierarchial phrase-based system by +0.9 BLEU points .
    Page 2, “Introduction”
  2. By using all the features (last line in the table), we improve the translation performance over the baseline system by 0.87 BLEU point on average.
    Page 7, “Experiments”
  3. We clearly find that the two rule-topic distributions improve the performance by 0.48 and 0.38 BLEU points over the baseline respectively.
    Page 7, “Experiments”
  4. Our topic similarity method on monotone rule achieves the most improvement which is 0.6 BLEU points , while the improvement on reordering rules is the smallest among the three types.
    Page 8, “Experiments”

See all papers in Proc. ACL 2012 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.

language model

Appears in 4 sentences as: language model (4)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. The monolingual data for training English language model includes the Xinhua portion of the GIGAWORD corpus, which contains 238M English words.
    Page 6, “Experiments”
  2. A 4—gram language model was trained on the monolingual data by the SRILM toolkit (Stolcke, 2002).
    Page 6, “Experiments”
  3. Researchers also introduce topic model for cross-lingual language model adaptation (Tam et al., 2007; Ruiz and Federico, 2011).
    Page 8, “Related Work”
  4. Based on the bilingual topic model, they apply the source-side topic weights into the target-side topic model, and adapt the n-gram language model of target side.
    Page 8, “Related Work”

See all papers in Proc. ACL 2012 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

NIST

Appears in 4 sentences as: NIST (4)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments.
    Page 1, “Abstract”
  2. We present our experiments on the NIST Chinese-English translation tasks.
    Page 6, “Experiments”
  3. We used the NIST evaluation set of 2005 (MT05) as our development set, and sets of MT06/MT08 as test sets.
    Page 6, “Experiments”
  4. Case-insensitive NIST BLEU (Papineni et al., 2002) was used to mea-
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

phrase pair

Appears in 4 sentences as: phrase pair (2) phrase pairs (2)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations.
    Page 8, “Experiments”
  2. (2010) introduce topic model for filtering topic-mismatched phrase pairs .
    Page 8, “Related Work”
  3. Similarly, each phrase pair is also assigned with one specific topic.
    Page 8, “Related Work”
  4. A phrase pair will be discarded if its topic mismatches the document topic.
    Page 8, “Related Work”

See all papers in Proc. ACL 2012 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

translation probability

Appears in 4 sentences as: translation probabilities (1) translation probability (3)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. Such models first estimate word translation probabilities conditioned on topics, and then adapt lexical weights of phrases
    Page 1, “Introduction”
  2. The process of rule-topic distribution estimation is analogous to the traditional estimation of rule translation probability (Chiang, 2007).
    Page 4, “Estimation”
  3. In the topic-specific lexicon translation model, given a source document, it first calculates the topic-specific translation probability by normalizing the entire lexicon translation table, and then adapts the lexical weights of rules correspondingly.
    Page 5, “Decoding”
  4. The lexicon translation probability is adapted by:
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention translation probability.

See all papers in Proc. ACL that mention translation probability.

Back to top.

translation quality

Appears in 4 sentences as: translation quality (4)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality .
    Page 1, “Introduction”
  2. We further show that both the source-side and target-side topic distributions improve translation quality and their improvements are complementary to each other.
    Page 2, “Introduction”
  3. Is our topic similarity model able to improve translation quality in terms of BLEU?
    Page 5, “Experiments”
  4. This verifies that topic similarity model can improve the translation quality significantly.
    Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

co-occurrence

Appears in 3 sentences as: co-occurrence (3)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. In the first step, we estimate the correspondence probability by the co-occurrence of the source-side and the target-side topic assignment of the word-aligned corpus.
    Page 4, “Estimation”
  2. Thus, the co-occurrence of a source-side topic with index kf and a target-side
    Page 4, “Estimation”
  3. We then compute the probability of P(z = kf|z 2 Ice) by normalizing the co-occurrence count.
    Page 5, “Estimation”

See all papers in Proc. ACL 2012 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

machine translation

Appears in 3 sentences as: machine translation (3)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level.
    Page 1, “Abstract”
  2. To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality.
    Page 1, “Introduction”
  3. In addition to the topic-specific lexicon translation method mentioned in the previous sections, researchers also explore topic model for machine translation in other ways.
    Page 8, “Related Work”

See all papers in Proc. ACL 2012 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

SMT system

Appears in 3 sentences as: SMT system (2) SMT systems (1)
In A Topic Similarity Model for Hierarchical Phrase-based Translation
  1. However, the state-of-the-art SMT systems translate sentences by using sequences of synchronous rules or phrases, instead of translating word by word.
    Page 1, “Introduction”
  2. Hellinger function is used to calculate distribution distance and is popular in topic model (Blei and Laf-ferty, 2007).1 By topic similarity, we aim to encourage or penalize the application of a rule for a given document according to their topic distributions, which then helps the SMT system make better translation decisions.
    Page 3, “Topic Similarity Model”
  3. By incorporating the topic sensitivity model with the topic similarity model, we enable our SMT system to balance the selection of these two types of rules.
    Page 3, “Topic Similarity Model”

See all papers in Proc. ACL 2012 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.