SciSurf: Index of 'Mixing Multiple Translation Models in Statistical Machine Translation'

Topics

translation model (16)
domain adaptation (11)
phrase-based (11)
language model (10)
BLEU (10)
log-linear (9)
machine translation (8)
phrase table (7)
BLEU scores (6)
model scores (4)
translation systems (4)
SMT system (4)
MT system (4)
log-linear model (4)
n-gram (3)
phrase pair (3)
LM (3)
Statistical machine translation (3)
in-domain (3)
BLEU points (3)

Topics

translation model (16)
domain adaptation (11)
phrase-based (11)
language model (10)
BLEU (10)
log-linear (9)
machine translation (8)
phrase table (7)
BLEU scores (6)
model scores (4)
translation systems (4)
SMT system (4)
MT system (4)
log-linear model (4)
n-gram (3)
phrase pair (3)
LM (3)
Statistical machine translation (3)
in-domain (3)
BLEU points (3)

Mixing Multiple Translation Models in Statistical Machine Translation

Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop

Published in Proc. ACL, 2012

Article Structure

Abstract

Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain.

Introduction

Statistical machine translation (SMT) systems require large parallel corpora in order to be able to obtain a reasonable translation quality.

Baselines

The natural baseline for model adaption is to concatenate the IN and OUT data into a single parallel corpus and train a model on it.

Ensemble Decoding

Ensemble decoding is a way to combine the expertise of different models in one single model.

Experiments & Results 4.1 Experimental Setup

We carried out translation experiments using the European Medicines Agency (EMEA) corpus (Tiede-mann, 2009) as IN, and the Europarl (EP) corpus1 as OUT, for French to English translation.

Related Work 5.1 Domain Adaptation

Early approaches to domain adaptation involved information retrieval techniques where sentence pairs related to the target domain were retrieved from the training corpus using IR methods (Eck et al., 2004; Hildebrand et al., 2005).

Conclusion & Future Work

In this paper, we presented a new approach for domain adaptation using ensemble decoding.

Topics

translation model

Appears in 16 sentences as: translation model (11) translation models (5)

In Mixing Multiple Translation Models in Statistical Machine Translation

Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain.
Page 1, “Abstract”
Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems: the language model and the translation model .
Page 1, “Introduction”
translation model adaptation, because various measures such as perplexity of adapted language models can be easily computed on data in the target domain.
Page 1, “Introduction”
It is also easier to obtain monolingual data in the target domain, compared to bilingual data which is required for translation model adaptation.
Page 1, “Introduction”
In this paper, we focused on adapting only the translation model by fixing a language model for all the experiments.
Page 1, “Introduction”
We expect domain adaptation for machine translation can be improved further by combining orthogonal techniques for translation model adaptation combined with language model adaptation.
Page 1, “Introduction”
In this paper, a new approach for adapting the translation model is proposed.
Page 1, “Introduction”
We use a novel system combination approach called ensemble decoding in order to combine two or more translation models with the goal of constructing a system that outperforms all the component models.
Page 1, “Introduction”
We have modified Kriya (Sankaran et al., 2012), an in-house implementation of hierarchical phrase-based translation system (Chiang, 2005), to implement ensemble decoding using multiple translation models .
Page 1, “Introduction”
Log-linear translation model (TM) mixtures are of the form:
Page 2, “Baselines”
Given a number of translation models which are already trained and tuned, the ensemble decoder uses hypotheses constructed from all of the models in order to translate a sentence.
Page 2, “Ensemble Decoding”

See all papers in Proc. ACL 2012 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

domain adaptation

Appears in 11 sentences as: Domain adaptation (1) domain adaptation (10)

In Mixing Multiple Translation Models in Statistical Machine Translation

In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain.
Page 1, “Abstract”
Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation.
Page 1, “Abstract”
Domain adaptation techniques aim at finding ways to adjust an out-of-domain (OUT) model to represent a target domain (in-domain or IN).
Page 1, “Introduction”
We expect domain adaptation for machine translation can be improved further by combining orthogonal techniques for translation model adaptation combined with language model adaptation.
Page 1, “Introduction”
The main applications of ensemble models are domain adaptation , domain mixing and system combination.
Page 1, “Introduction”
We compare the results of ensemble decoding with a number of baselines for domain adaptation .
Page 1, “Introduction”
Each of these mixture operations has a specific property that makes it work in specific domain adaptation or system combination scenarios.
Page 4, “Ensemble Decoding”
For instance, LOPs may not be optimal for domain adaptation in the setting where there are two or more models trained on heterogeneous corpora.
Page 4, “Ensemble Decoding”
Early approaches to domain adaptation involved information retrieval techniques where sentence pairs related to the target domain were retrieved from the training corpus using IR methods (Eck et al., 2004; Hildebrand et al., 2005).
Page 6, “Related Work 5.1 Domain Adaptation”
Other domain adaptation methods involve techniques that distinguish between general and domain-specific examples (Daumé and Marcu, 2006).
Page 6, “Related Work 5.1 Domain Adaptation”
In this paper, we presented a new approach for domain adaptation using ensemble decoding.
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

phrase-based

Appears in 11 sentences as: phrase-based (14)

In Mixing Multiple Translation Models in Statistical Machine Translation

We have modified Kriya (Sankaran et al., 2012), an in-house implementation of hierarchical phrase-based translation system (Chiang, 2005), to implement ensemble decoding using multiple translation models.
Page 1, “Introduction”
The current implementation is able to combine hierarchical phrase-based systems (Chiang, 2005) as well as phrase-based translation systems (Koehn et al., 2003).
Page 2, “Ensemble Decoding”
phrase-based, hierarchical phrase-based , and/or syntax-based systems.
Page 2, “Ensemble Decoding”
For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model (LM) and word count.
Page 5, “Experiments & Results 4.1 Experimental Setup”
For ensemble decoding, we modified an in-house implementation of hierarchical phrase-based system, Kriya (Sankaran et al., 2012) which uses the same features mentioned in (Chiang, 2005): forward and backward relative-frequency and lexical TM probabilities; LM; word, phrase and glue-rules penalty.
Page 5, “Experiments & Results 4.1 Experimental Setup”
The first group are the baseline results on the phrase-based system discussed in Section 2 and the second group are those of our hierarchical MT system.
Page 5, “Experiments & Results 4.1 Experimental Setup”
Since the Hiero baselines results were substantially better than those of the phrase-based model, we also implemented the best-performing baseline, linear mixture, in our Hiero-style MT system and in fact it achieves the hights BLEU score among all the baselines as shown in Table 2.
Page 5, “Experiments & Results 4.1 Experimental Setup”
Table 2: The results of various baselines implemented in a phrase-based (PBS) and a Hiero SMT on EMEA.
Page 5, “Experiments & Results 4.1 Experimental Setup”
Among these approaches are sentence-based, phrase-based and word-based output combination methods.
Page 7, “Related Work 5.1 Domain Adaptation”
Firstly, unlike the multi-table support of Moses which only supports phrase-based translation table combination, our approach supports ensembles of both hierarchical and phrase-based systems.
Page 7, “Related Work 5.1 Domain Adaptation”
We will also add capability of supporting syntax-based ensemble decoding and experiment how a phrase-based system can benefit from syntax information present in a syntax-aware MT system.
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

language model

Appears in 10 sentences as: language model (8) language models (2)

In Mixing Multiple Translation Models in Statistical Machine Translation

Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems: the language model and the translation model.
Page 1, “Introduction”
However, language model adaptation is a more straightforward problem compared to
Page 1, “Introduction”
translation model adaptation, because various measures such as perplexity of adapted language models can be easily computed on data in the target domain.
Page 1, “Introduction”
As a result, language model adaptation has been well studied in various work (Clarkson and Robinson, 1997; Seymore and Rosenfeld, 1997; Bacchiani and Roark, 2003; Eck et al., 2004) both for speech recognition and for machine translation.
Page 1, “Introduction”
In this paper, we focused on adapting only the translation model by fixing a language model for all the experiments.
Page 1, “Introduction”
We expect domain adaptation for machine translation can be improved further by combining orthogonal techniques for translation model adaptation combined with language model adaptation.
Page 1, “Introduction”
For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model (LM) and word count.
Page 5, “Experiments & Results 4.1 Experimental Setup”
Fixing the language model allows us to compare various translation model combination techniques.
Page 5, “Experiments & Results 4.1 Experimental Setup”
They use language model perplexities from IN to select relavant sentences from OUT.
Page 6, “Related Work 5.1 Domain Adaptation”
Future work includes extending this approach to use multiple translation models with multiple language models in ensemble decoding.
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

BLEU

Appears in 10 sentences as: BLEU (12)

In Mixing Multiple Translation Models in Statistical Machine Translation

where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table, and each Am is a weight in the top-level log-linear model, set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003).
Page 2, “Baselines”
In Section 4.2, we compare the BLEU scores of different mixture operations on a French-English experimental setup.
Page 4, “Ensemble Decoding”
However, experiments showed changing the scores with the normalized scores hurts the BLEU score radically.
Page 4, “Ensemble Decoding”
However, we did not try it as the BLEU scores we got using the normalization heuristic was not promissing and it would impose a cost in decoding as well.
Page 4, “Ensemble Decoding”
Since the Hiero baselines results were substantially better than those of the phrase-based model, we also implemented the best-performing baseline, linear mixture, in our Hiero-style MT system and in fact it achieves the hights BLEU score among all the baselines as shown in Table 2.
Page 5, “Experiments & Results 4.1 Experimental Setup”
This baseline is run three times the score is averaged over the BLEU scores with standard deviation of 0.34.
Page 5, “Experiments & Results 4.1 Experimental Setup”
We also reported the BLEU scores when we applied the span-wise normalization heuristic.
Page 5, “Experiments & Results 4.1 Experimental Setup”
In particular, Switching:Max could gain up to 2.2 BLEU points over the concatenation baseline and 0.39 BLEU points over the best performing baseline (i.e.
Page 5, “Experiments & Results 4.1 Experimental Setup”
lowest score among the mixture operations, however after tuning, it learns to bias the weights towards one of the models and hence improves by 1.31 BLEU points.
Page 6, “Experiments & Results 4.1 Experimental Setup”
We showed that this approach can gain up to 2.2 BLEU points over its concatenation baseline and 0.39 BLEU points over a powerful mixture model.
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

log-linear

Appears in 9 sentences as: Log-Linear (1) Log-linear (1) log-linear (7)

In Mixing Multiple Translation Models in Statistical Machine Translation

In addition to the basic approach of concatenation of in-domain and out-of-domain data, we also trained a log-linear mixture model (Foster and Kuhn, 2007)
Page 1, “Introduction”
2.1 Log-Linear Mixture
Page 2, “Baselines”
Log-linear translation model (TM) mixtures are of the form:
Page 2, “Baselines”
where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table, and each Am is a weight in the top-level log-linear model, set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003).
Page 2, “Baselines”
In the typical log-linear model SMT, the posterior
Page 3, “Ensemble Decoding”
log-linear mixture).
Page 3, “Ensemble Decoding”
Since in log-linear models, the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale.
Page 4, “Ensemble Decoding”
It was filtered to retain the top 20 translations for each source phrase using the TM part of the current log-linear model.
Page 5, “Experiments & Results 4.1 Experimental Setup”
Two famous examples of such methods are linear mixtures and log-linear mixtures (Koehn and Schroeder, 2007; Civera and Juan, 2007; Foster and Kuhn, 2007) which were used as baselines and discussed in Section 2.
Page 6, “Related Work 5.1 Domain Adaptation”

See all papers in Proc. ACL 2012 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

machine translation

Appears in 8 sentences as: Machine Translation (1) machine translation (7)

In Mixing Multiple Translation Models in Statistical Machine Translation

Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain.
Page 1, “Abstract”
Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation .
Page 1, “Abstract”
Statistical machine translation (SMT) systems require large parallel corpora in order to be able to obtain a reasonable translation quality.
Page 1, “Introduction”
els in Statistical Machine Translation
Page 1, “Introduction”
As a result, language model adaptation has been well studied in various work (Clarkson and Robinson, 1997; Seymore and Rosenfeld, 1997; Bacchiani and Roark, 2003; Eck et al., 2004) both for speech recognition and for machine translation .
Page 1, “Introduction”
We expect domain adaptation for machine translation can be improved further by combining orthogonal techniques for translation model adaptation combined with language model adaptation.
Page 1, “Introduction”
(2010) propose a similar method for machine translation that uses features to capture degrees of generality.
Page 6, “Related Work 5.1 Domain Adaptation”
Unlike previous work on instance weighting in machine translation , they use phrase-level instances instead of sentences.
Page 6, “Related Work 5.1 Domain Adaptation”

See all papers in Proc. ACL 2012 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

phrase table

Appears in 7 sentences as: phrase table (4) phrase tables (3)

In Mixing Multiple Translation Models in Statistical Machine Translation

where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table , and each Am is a weight in the top-level log-linear model, set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003).
Page 2, “Baselines”
Whenever a phrase pair does not appear in a component phrase table , we set the corresponding pm(é| f) to a small epsilon value.
Page 2, “Baselines”
Whenever a phrase pair does not appear in a component phrase table , we set the corresponding pm(é|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded.
Page 2, “Baselines”
The cells of the CKY chart are populated with appropriate rules from all the phrase tables of different components.
Page 2, “Ensemble Decoding”
The corpus was word-aligned using both HMM and IBM2 models, and the phrase table was the union of phrases extracted from these separate alignments, with a length limit of 7.
Page 5, “Experiments & Results 4.1 Experimental Setup”
In intersection, for each span only the hypotheses would be used that are present in all phrase tables .
Page 7, “Related Work 5.1 Domain Adaptation”
Union, on the other hand, uses hypotheses from all the phrase tables .
Page 7, “Related Work 5.1 Domain Adaptation”

See all papers in Proc. ACL 2012 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

BLEU scores

Appears in 6 sentences as: BLEU score (2) BLEU scores (4)

In Mixing Multiple Translation Models in Statistical Machine Translation

In Section 4.2, we compare the BLEU scores of different mixture operations on a French-English experimental setup.
Page 4, “Ensemble Decoding”
However, experiments showed changing the scores with the normalized scores hurts the BLEU score radically.
Page 4, “Ensemble Decoding”
However, we did not try it as the BLEU scores we got using the normalization heuristic was not promissing and it would impose a cost in decoding as well.
Page 4, “Ensemble Decoding”
Since the Hiero baselines results were substantially better than those of the phrase-based model, we also implemented the best-performing baseline, linear mixture, in our Hiero-style MT system and in fact it achieves the hights BLEU score among all the baselines as shown in Table 2.
Page 5, “Experiments & Results 4.1 Experimental Setup”
This baseline is run three times the score is averaged over the BLEU scores with standard deviation of 0.34.
Page 5, “Experiments & Results 4.1 Experimental Setup”
We also reported the BLEU scores when we applied the span-wise normalization heuristic.
Page 5, “Experiments & Results 4.1 Experimental Setup”

See all papers in Proc. ACL 2012 that mention BLEU scores.

See all papers in Proc. ACL that mention BLEU scores.

Back to top.

model scores

Appears in 4 sentences as: model scores (4)

In Mixing Multiple Translation Models in Statistical Machine Translation

where 6} denotes the mixture operation between two or more model scores .
Page 3, “Ensemble Decoding”
o Weighted Max (wmax): where the ensemble score is the weighted max of all model scores .
Page 3, “Ensemble Decoding”
Since in log-linear models, the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale.
Page 4, “Ensemble Decoding”
An interesting observation based on the results in Table 3 is that uniform weights are doing reasonably well given that the component weights are not optimized and therefore model scores may not be in the same scope (refer to discussion in §3.2).
Page 6, “Experiments & Results 4.1 Experimental Setup”

See all papers in Proc. ACL 2012 that mention model scores.

See all papers in Proc. ACL that mention model scores.

Back to top.

translation systems

Appears in 4 sentences as: translation system (1) translation systems (3)

In Mixing Multiple Translation Models in Statistical Machine Translation

We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step.
Page 1, “Abstract”
We have modified Kriya (Sankaran et al., 2012), an in-house implementation of hierarchical phrase-based translation system (Chiang, 2005), to implement ensemble decoding using multiple translation models.
Page 1, “Introduction”
The current implementation is able to combine hierarchical phrase-based systems (Chiang, 2005) as well as phrase-based translation systems (Koehn et al., 2003).
Page 2, “Ensemble Decoding”
However, the method can be easily extended to support combining a number of heterogeneous translation systems e.g.
Page 2, “Ensemble Decoding”

See all papers in Proc. ACL 2012 that mention translation systems.

See all papers in Proc. ACL that mention translation systems.

Back to top.

SMT system

Appears in 4 sentences as: SMT system (3) SMT systems (1)

In Mixing Multiple Translation Models in Statistical Machine Translation

Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems : the language model and the translation model.
Page 1, “Introduction”
As in the Hiero SMT system (Chiang, 2005), the cells which span up to a certain length (i.e.
Page 2, “Ensemble Decoding”
In a similar approach, Koehn and Schroeder (2007) use a feature of the factored translation model framework in Moses SMT system (Koehn and Schroeder, 2007) to use multiple alternative decoding paths.
Page 7, “Related Work 5.1 Domain Adaptation”
The Moses SMT system implements (Koehn and Schroeder,
Page 7, “Related Work 5.1 Domain Adaptation”

See all papers in Proc. ACL 2012 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

MT system

Appears in 4 sentences as: MT system (3) MT systems (1)

In Mixing Multiple Translation Models in Statistical Machine Translation

The first group are the baseline results on the phrase-based system discussed in Section 2 and the second group are those of our hierarchical MT system .
Page 5, “Experiments & Results 4.1 Experimental Setup”
Since the Hiero baselines results were substantially better than those of the phrase-based model, we also implemented the best-performing baseline, linear mixture, in our Hiero-style MT system and in fact it achieves the hights BLEU score among all the baselines as shown in Table 2.
Page 5, “Experiments & Results 4.1 Experimental Setup”
In this approach a number of MT systems are combined at decoding time in order to form an ensemble model.
Page 8, “Conclusion & Future Work”
We will also add capability of supporting syntax-based ensemble decoding and experiment how a phrase-based system can benefit from syntax information present in a syntax-aware MT system .
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention MT system.

See all papers in Proc. ACL that mention MT system.

Back to top.

log-linear model

Appears in 4 sentences as: log-linear model (3) log-linear models (1)

In Mixing Multiple Translation Models in Statistical Machine Translation

where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table, and each Am is a weight in the top-level log-linear model , set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003).
Page 2, “Baselines”
In the typical log-linear model SMT, the posterior
Page 3, “Ensemble Decoding”
Since in log-linear models , the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale.
Page 4, “Ensemble Decoding”
It was filtered to retain the top 20 translations for each source phrase using the TM part of the current log-linear model .
Page 5, “Experiments & Results 4.1 Experimental Setup”

See all papers in Proc. ACL 2012 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.

n-gram

Appears in 3 sentences as: n-gram (3)

In Mixing Multiple Translation Models in Statistical Machine Translation

In other words, it requires all component models to fully decode each sentence, compute n-gram expectations from each component model and calculate posterior probabilities over translation derivations.
Page 8, “Related Work 5.1 Domain Adaptation”
Finally, main techniques used in this work are orthogonal to our approach such as Minimum Bayes Risk decoding, using n-gram features and tuning using MERT.
Page 8, “Related Work 5.1 Domain Adaptation”
In addition, we can extend our approach by applying some of the techniques used in other system combination approaches such as consensus decoding, using n-gram features, tuning using forest-based MERT, among other possible extensions.
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

phrase pair

Appears in 3 sentences as: phrase pair (3)

In Mixing Multiple Translation Models in Statistical Machine Translation

Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é| f) to a small epsilon value.
Page 2, “Baselines”
Whenever a phrase pair does not appear in a component phrase table, we set the corresponding pm(é|f) to 0; pairs in 15(6, that do not appear in at least one component table are discarded.
Page 2, “Baselines”
probability for each phrase pair (6, f) is given by:
Page 3, “Ensemble Decoding”

See all papers in Proc. ACL 2012 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

LM

Appears in 3 sentences as: LM (3)

In Mixing Multiple Translation Models in Statistical Machine Translation

For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model ( LM ) and word count.
Page 5, “Experiments & Results 4.1 Experimental Setup”
For ensemble decoding, we modified an in-house implementation of hierarchical phrase-based system, Kriya (Sankaran et al., 2012) which uses the same features mentioned in (Chiang, 2005): forward and backward relative-frequency and lexical TM probabilities; LM ; word, phrase and glue-rules penalty.
Page 5, “Experiments & Results 4.1 Experimental Setup”
We suspect this is because a single LM is shared between both models.
Page 6, “Experiments & Results 4.1 Experimental Setup”

See all papers in Proc. ACL 2012 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

Statistical machine translation

Appears in 3 sentences as: Statistical Machine Translation (1) Statistical machine translation (2)

In Mixing Multiple Translation Models in Statistical Machine Translation

Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain.
Page 1, “Abstract”
Statistical machine translation (SMT) systems require large parallel corpora in order to be able to obtain a reasonable translation quality.
Page 1, “Introduction”
els in Statistical Machine Translation
Page 1, “Introduction”

See all papers in Proc. ACL 2012 that mention Statistical machine translation.

See all papers in Proc. ACL that mention Statistical machine translation.

Back to top.

in-domain

Appears in 3 sentences as: in-domain (3)

In Mixing Multiple Translation Models in Statistical Machine Translation

Domain adaptation techniques aim at finding ways to adjust an out-of-domain (OUT) model to represent a target domain ( in-domain or IN).
Page 1, “Introduction”
In addition to the basic approach of concatenation of in-domain and out-of-domain data, we also trained a log-linear mixture model (Foster and Kuhn, 2007)
Page 1, “Introduction”
Other methods include using self-training techniques to exploit monolingual in-domain data (Ueffing et al., 2007;
Page 6, “Related Work 5.1 Domain Adaptation”

See all papers in Proc. ACL 2012 that mention in-domain.

See all papers in Proc. ACL that mention in-domain.

Back to top.

BLEU points

Appears in 3 sentences as: BLEU points (5)

In Mixing Multiple Translation Models in Statistical Machine Translation

In particular, Switching:Max could gain up to 2.2 BLEU points over the concatenation baseline and 0.39 BLEU points over the best performing baseline (i.e.
Page 5, “Experiments & Results 4.1 Experimental Setup”
lowest score among the mixture operations, however after tuning, it learns to bias the weights towards one of the models and hence improves by 1.31 BLEU points .
Page 6, “Experiments & Results 4.1 Experimental Setup”
We showed that this approach can gain up to 2.2 BLEU points over its concatenation baseline and 0.39 BLEU points over a powerful mixture model.
Page 8, “Conclusion & Future Work”

See all papers in Proc. ACL 2012 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.