Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem

Article Structure

Abstract

We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations.

Introduction

Conversational spoken language translation (CSLT) systems facilitate communication between subjects who do not speak the same language.

Relation to Prior Work

Domain adaptation to improve SMT performance has attracted considerable attention in recent years (Foster and Kuhn, 2007; Finch and Sumita, 2008; Matsoukas et al., 2009).

Corpus Data and Baseline SMT

We use the DARPA TransTac English-Iraqi parallel two-way spoken dialogue collection to train both translation and LDA topic models.

Incremental Topic-Based Adaptation

Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality.

Experimental Setup and Results

The baseline English-to-Iraqi phrase-based SMT system was built as described in Section 3.

Discussion and Future Directions

We have presented a novel, incremental topic-based translation model adaptation approach that obeys the causality constraint imposed by spoken conversations.

Topics

topic distribution

Appears in 19 sentences as: topic distribution (14) topic distributions (5)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. A significant novelty of our adaptation technique is its incremental nature; we continuously update the topic distribution on the evolving test conversation as new utterances become available.
    Page 1, “Abstract”
  2. At runtime, this model is used to infer a topic distribution over the evolving test conversation up to and including the current utterance.
    Page 1, “Introduction”
  3. Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
    Page 1, “Introduction”
  4. The topic distribution for the test conversation is updated incrementally for each new utterance as the available history grows.
    Page 1, “Introduction”
  5. (2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document.
    Page 2, “Relation to Prior Work”
  6. Our proposed approach infers topic distributions incrementally as the conversation progresses.
    Page 2, “Relation to Prior Work”
  7. Second, we do not directly augment the translation table with the inferred topic distribution .
    Page 2, “Relation to Prior Work”
  8. The training, development and test sets were partitioned at the conversation level, so that we could model a topic distribution for entire conversations, both during training and during tuning and testing.
    Page 2, “Corpus Data and Baseline SMT”
  9. The topic distribution is incrementally updated as the conversation history grows, and we recompute the topic similarity between the current conversation and the training conversations for each new source utterance.
    Page 2, “Incremental Topic-Based Adaptation”
  10. We use latent Dirichlet allocation, or LDA, (Blei et al., 2003) to obtain a topic distribution over conversations.
    Page 3, “Incremental Topic-Based Adaptation”
  11. For each conversation di in the training collection (1,600 conversations), LDA infers a topic distribution Qdi = p(zk|di) for all latent topics 2],, = {1, ...,K}, where K is the number of topics.
    Page 3, “Incremental Topic-Based Adaptation”

See all papers in Proc. ACL 2013 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

LDA

Appears in 15 sentences as: LDA (17)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training conversations, which is used to bias translation choices towards the current context.
    Page 1, “Abstract”
  2. We begin by building a monolingual latent Dirichlet allocation (LDA) topic model on the training conversations (each conversation corresponds to a “document” in the LDA paradigm).
    Page 1, “Introduction”
  3. (2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document.
    Page 2, “Relation to Prior Work”
  4. While our proposed approach also employs monolingual LDA topic models, it deviates from the above methods in the following important ways.
    Page 2, “Relation to Prior Work”
  5. We use the DARPA TransTac English-Iraqi parallel two-way spoken dialogue collection to train both translation and LDA topic models.
    Page 2, “Corpus Data and Baseline SMT”
  6. We use the English side of these conversations for training LDA topic models.
    Page 2, “Corpus Data and Baseline SMT”
  7. 4.1 Topic modeling with LDA
    Page 3, “Incremental Topic-Based Adaptation”
  8. We use latent Dirichlet allocation, or LDA , (Blei et al., 2003) to obtain a topic distribution over conversations.
    Page 3, “Incremental Topic-Based Adaptation”
  9. For each conversation di in the training collection (1,600 conversations), LDA infers a topic distribution Qdi = p(zk|di) for all latent topics 2],, = {1, ...,K}, where K is the number of topics.
    Page 3, “Incremental Topic-Based Adaptation”
  10. Thus, each development/test utterance is associated with a different conversation history d*, for which we infer a topic distribution 6d* 2 p(zk|d*) using the trained LDA model.
    Page 3, “Incremental Topic-Based Adaptation”
  11. (Key: incrN = incremental LDA with N topics; convN = non-incremental, whole-conversation LDA with N topics.)
    Page 3, “Incremental Topic-Based Adaptation”

See all papers in Proc. ACL 2013 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

phrase pairs

Appears in 15 sentences as: phrase pair (6) Phrase pairs (2) phrase pairs (8)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. As the conversation progresses, however, the gradual accumulation of contextual information can be used to infer the topic(s) of discussion, and to deploy contextually appropriate translation phrase pairs .
    Page 1, “Introduction”
  2. Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
    Page 1, “Introduction”
  3. Rather, we compute a similarity between the current conversation history and each of the training conversations, and use this measure to dynamically score the relevance of candidate translation phrase pairs during decoding.
    Page 2, “Relation to Prior Work”
  4. We used this corpus to extract translation phrase pairs from bidirectional IBM Model 4 word alignment (Och and Ney, 2003) based on the heuristic approach of (Koehn et al., 2003).
    Page 2, “Corpus Data and Baseline SMT”
  5. Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
    Page 2, “Corpus Data and Baseline SMT”
  6. All other sentence pairs are assigned to a “background conversation”, which signals the absence of the topic similarity feature for phrase pairs derived from these instances.
    Page 2, “Corpus Data and Baseline SMT”
  7. Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality.
    Page 2, “Incremental Topic-Based Adaptation”
  8. Additionally, the SMT phrase table tracks, for each phrase pair, the set of parent training conversations (including the “background conversation”) from which that phrase pair originated.
    Page 3, “Incremental Topic-Based Adaptation”
  9. Using this information, the decoder evaluates, for each candidate phrase pair
    Page 3, “Incremental Topic-Based Adaptation”
  10. where Par(X —> Y) is the set of training conversations from which the candidate phrase pair originated.
    Page 3, “Incremental Topic-Based Adaptation”
  11. Phrase pairs from the “background conversation” only are assigned a similarity score FX_>y = 0.00.
    Page 3, “Incremental Topic-Based Adaptation”

See all papers in Proc. ACL 2013 that mention phrase pairs.

See all papers in Proc. ACL that mention phrase pairs.

Back to top.

topic models

Appears in 10 sentences as: topic model (2) Topic modeling (1) topic modeling (1) topic models (6)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training conversations, which is used to bias translation choices towards the current context.
    Page 1, “Abstract”
  2. We begin by building a monolingual latent Dirichlet allocation (LDA) topic model on the training conversations (each conversation corresponds to a “document” in the LDA paradigm).
    Page 1, “Introduction”
  3. To avoid the need for hard decisions about domain membership, some have used topic modeling to improve SMT performance, e.g., using latent semantic analysis (Tam et al., 2007) or ‘biTAM’ (Zhao and Xing, 2006).
    Page 2, “Relation to Prior Work”
  4. (2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document.
    Page 2, “Relation to Prior Work”
  5. While our proposed approach also employs monolingual LDA topic models , it deviates from the above methods in the following important ways.
    Page 2, “Relation to Prior Work”
  6. We use the DARPA TransTac English-Iraqi parallel two-way spoken dialogue collection to train both translation and LDA topic models .
    Page 2, “Corpus Data and Baseline SMT”
  7. We use the English side of these conversations for training LDA topic models .
    Page 2, “Corpus Data and Baseline SMT”
  8. 4.1 Topic modeling with LDA
    Page 3, “Incremental Topic-Based Adaptation”
  9. The full conversation history is available for training the topic models and estimating topic distributions in the training set.
    Page 3, “Incremental Topic-Based Adaptation”
  10. We use Mallet (McCallum, 2002) for training topic models and inferring topic distributions.
    Page 3, “Incremental Topic-Based Adaptation”

See all papers in Proc. ACL 2013 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

NIST

Appears in 6 sentences as: NIST (4) NISTT (2)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. On an English-to-Iraqi CSLT task, the proposed approach gives significant improvements over a baseline system as measured by BLEU, TER, and NIST .
    Page 1, “Abstract”
  2. With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.
    Page 1, “Introduction”
  3. 1 REFERENCE TRANSCRIPTIONS SYSTEM 1 BLEUT 1 TER1 1 NISTT
    Page 3, “Incremental Topic-Based Adaptation”
  4. SYSTEM 1 BLEUT 1 TER1 1 NISTT
    Page 3, “Incremental Topic-Based Adaptation”
  5. Table 1 summarizes test set performance in BLEU (Papineni et a1., 2001), NIST (Doddington, 2002) and TER (Snover et a1., 2006).
    Page 4, “Experimental Setup and Results”
  6. In the ASR setting, which simulates a real-world deployment scenario, this system achieves improvements of 0.39 (BLEU), -0.6 (TER) and 0.08 ( NIST ).
    Page 4, “Experimental Setup and Results”

See all papers in Proc. ACL 2013 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

BLEU

Appears in 5 sentences as: BLEU (5)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. On an English-to-Iraqi CSLT task, the proposed approach gives significant improvements over a baseline system as measured by BLEU , TER, and NIST.
    Page 1, “Abstract”
  2. With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU , TER and NIST scores on an English-to-Iraqi CSLT task.
    Page 1, “Introduction”
  3. Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
    Page 2, “Corpus Data and Baseline SMT”
  4. Table 1 summarizes test set performance in BLEU (Papineni et a1., 2001), NIST (Doddington, 2002) and TER (Snover et a1., 2006).
    Page 4, “Experimental Setup and Results”
  5. In the ASR setting, which simulates a real-world deployment scenario, this system achieves improvements of 0.39 ( BLEU ), -0.6 (TER) and 0.08 (NIST).
    Page 4, “Experimental Setup and Results”

See all papers in Proc. ACL 2013 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

phrase-based

Appears in 5 sentences as: phrase-based (5)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. In this paper, we describe a novel topic-based adaptation technique for phrase-based statistical machine translation (SMT) of spoken conversations.
    Page 1, “Introduction”
  2. Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
    Page 1, “Introduction”
  3. With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.
    Page 1, “Introduction”
  4. Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
    Page 2, “Corpus Data and Baseline SMT”
  5. The baseline English-to-Iraqi phrase-based SMT system was built as described in Section 3.
    Page 4, “Experimental Setup and Results”

See all papers in Proc. ACL 2013 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

sentence pairs

Appears in 5 sentences as: sentence pairs (5)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. The SMT parallel training corpus contains approximately 773K sentence pairs (7.3M English words).
    Page 2, “Corpus Data and Baseline SMT”
  2. Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs , 45K words) using BLEU as the tuning metric.
    Page 2, “Corpus Data and Baseline SMT”
  3. Finally, we evaluated translation performance on a separate, unseen test set (3,138 sentence pairs , 38K words).
    Page 2, “Corpus Data and Baseline SMT”
  4. Of the 773K training sentence pairs , about 100K (corresponding to 1,600 conversations) are marked with conversation boundaries.
    Page 2, “Corpus Data and Baseline SMT”
  5. All other sentence pairs are assigned to a “background conversation”, which signals the absence of the topic similarity feature for phrase pairs derived from these instances.
    Page 2, “Corpus Data and Baseline SMT”

See all papers in Proc. ACL 2013 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

similarity score

Appears in 4 sentences as: similarity score (4) similarity scores (1)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. We define the similarity score as sim(6di, 661*) = 1 — JSD(6di||6d*).1 Thus, we obtain a vector of similarity scores indexed by the training conversations.
    Page 3, “Incremental Topic-Based Adaptation”
  2. X —> Y added to the search graph, its topic similarity score as follows:
    Page 3, “Incremental Topic-Based Adaptation”
  3. Phrase pairs from the “background conversation” only are assigned a similarity score FX_>y = 0.00.
    Page 3, “Incremental Topic-Based Adaptation”
  4. Phrase pairs which only occur in the background conversation are not directly penalized, but contribute nothing to the topic similarity score .
    Page 3, “Incremental Topic-Based Adaptation”

See all papers in Proc. ACL 2013 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.

TER

Appears in 4 sentences as: TER (4)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. On an English-to-Iraqi CSLT task, the proposed approach gives significant improvements over a baseline system as measured by BLEU, TER , and NIST.
    Page 1, “Abstract”
  2. With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.
    Page 1, “Introduction”
  3. Table 1 summarizes test set performance in BLEU (Papineni et a1., 2001), NIST (Doddington, 2002) and TER (Snover et a1., 2006).
    Page 4, “Experimental Setup and Results”
  4. In the ASR setting, which simulates a real-world deployment scenario, this system achieves improvements of 0.39 (BLEU), -0.6 ( TER ) and 0.08 (NIST).
    Page 4, “Experimental Setup and Results”

See all papers in Proc. ACL 2013 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

translation model

Appears in 4 sentences as: translation model (4)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations.
    Page 1, “Abstract”
  2. Our approach is based on the premise that biasing the translation model to favor phrase pairs originating in training conversations that are contextually similar to the current conversation will lead to better translation quality.
    Page 2, “Incremental Topic-Based Adaptation”
  3. We add this feature to the log-linear translation model with its own weight, which is tuned with MERT.
    Page 3, “Incremental Topic-Based Adaptation”
  4. We have presented a novel, incremental topic-based translation model adaptation approach that obeys the causality constraint imposed by spoken conversations.
    Page 4, “Discussion and Future Directions”

See all papers in Proc. ACL 2013 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

log-linear

Appears in 3 sentences as: log-linear (3)
In Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
  1. Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
    Page 1, “Introduction”
  2. Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
    Page 2, “Corpus Data and Baseline SMT”
  3. We add this feature to the log-linear translation model with its own weight, which is tuned with MERT.
    Page 3, “Incremental Topic-Based Adaptation”

See all papers in Proc. ACL 2013 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.