Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas

Article Structure

Abstract

We present a new translation model integrating the shallow local multi bottom-up tree transducer.

Introduction

Besides phrase-based machine translation systems (Koehn et al., 2003), syntax-based systems have become widely used because of their ability to handle nonlocal reordering.

Theoretical Model

In this section, we present the theoretical generative model used in our approach to syntax-based machine translation.

Translation Model

Given a source language sentence 6, our translation model aims to find the best corresponding target language translation §;7 i.e.,

Decoding

We implemented our model in the syntax-based component of the Moses open-source toolkit by Koehn et al.

Experiments

5.1 Setup

Conclusion and Future Work

We demonstrated that our EMBOT-based machine translation system beats a standard tree-to-tree system (Moses tree-to-tree) on the WMT 2009 translation task English —> German.

Topics

LM

Appears in 17 sentences as: LM (20)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. The language model ( LM ) scoring is directly integrated into the cube pruning algorithm.
    Page 6, “Decoding”
  2. Thus, LM estimates are available for all considered hypotheses.
    Page 6, “Decoding”
  3. Because the output trees can remain discontiguous after hypothesis creation, LM scoring has to be done individually over all output trees.
    Page 6, “Decoding”
  4. Algorithm 2 describes our LM scoring in detail.
    Page 6, “Decoding”
  5. Estimate LM score for h // see Algorithm 2 Estimate remaining feature scores
    Page 6, “Decoding”
  6. Once all the strings are built, we score them using our 4- gram LM .
    Page 6, “Decoding”
  7. The overall LM score for the pre-translation is obtained by multiplying the scores for 2111,. .
    Page 6, “Decoding”
  8. Whenever such a concatenation happens, our LM scoring will automatically compute n-gram LM scores based on the concatenation, which in particular means that the LM scores get more accurate for larger spans.
    Page 6, “Decoding”
  9. Finally, in the final rule only one component is allowed, which yields that the LM indeed scores the complete output sentence.
    Page 6, “Decoding”
  10. Figure 7 illustrates our LM scoring for a pre-translation involving a rule with two (discontiguous) target sequences (the construction of the pre-translation is illustrated in Figure 6).
    Page 6, “Decoding”
  11. When processing the rule rooted at S, an LM estimate is computed by expanding all nonterminal leaves.
    Page 6, “Decoding”

See all papers in Proc. ACL 2013 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

language model

Appears in 5 sentences as: language model (5)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. (1) The forward translation weight using the rule weights as described in Section 2 (2) The indirect translation weight using the rule weights as described in Section 2 (3) Lexical translation weight source —> target (4) Lexical translation weight target —> source (5) Target side language model (6) Number of words in the target sentences (7) Number of rules used in the pre-translation (8) Number of target side sequences; here k times the number of sequences used in the pre-translations that constructed 7' (gap penalty) The rule weights required for (l) are relative frequencies normalized over all rules with the same left-hand side.
    Page 4, “Translation Model”
  2. The computation of the language model estimates for (6) is adapted to score partial translations consisting of discontiguous units.
    Page 4, “Translation Model”
  3. The language model (LM) scoring is directly integrated into the cube pruning algorithm.
    Page 6, “Decoding”
  4. Naturally, we also had to adjust hypothesis expansion and, most importantly, language model scoring inside the cube pruning algorithm.
    Page 6, “Decoding”
  5. Our German 4-gram language model was trained on the German sentences in the training data augmented by the Stuttgart SdeWaC corpus (Web-as-Corpus Consortium, 2008), whose generation is detailed in (Baroni et al., 2009).
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

machine translation

Appears in 5 sentences as: machine translation (5)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. Besides phrase-based machine translation systems (Koehn et al., 2003), syntax-based systems have become widely used because of their ability to handle nonlocal reordering.
    Page 1, “Introduction”
  2. In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model.
    Page 1, “Introduction”
  3. In this section, we present the theoretical generative model used in our approach to syntax-based machine translation .
    Page 2, “Theoretical Model”
  4. We demonstrated that our EMBOT-based machine translation system beats a standard tree-to-tree system (Moses tree-to-tree) on the WMT 2009 translation task English —> German.
    Page 9, “Conclusion and Future Work”
  5. To achieve this we implemented the formal model as described in Section 2 inside the Moses machine translation toolkit.
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2013 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

translation model

Appears in 5 sentences as: translation model (4) translation models (1)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. We present a new translation model integrating the shallow local multi bottom-up tree transducer.
    Page 1, “Abstract”
  2. In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model .
    Page 1, “Introduction”
  3. The theoretical foundations of [MBOT and their integration into our translation model are presented in Sections 2 and 3.
    Page 1, “Introduction”
  4. Given a source language sentence 6, our translation model aims to find the best corresponding target language translation §;7 i.e.,
    Page 4, “Translation Model”
  5. Both translation models were trained with approximately 1.5 million bilingual sentences after length-ratio filtering.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

BLEU

Appears in 4 sentences as: BLEU (5)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. The translation quality is automatically measured using BLEU scores, and we confirm the findings by providing linguistic evidence (see Section 5).
    Page 2, “Introduction”
  2. System BLEU Baseline 12.60 [MB OT * 13 .06
    Page 7, “Experiments”
  3. We measured the overall translation quality with the help of 4-gram BLEU (Papineni et al., 2002), which was computed on tokenized and lower-cased data for both systems.
    Page 7, “Experiments”
  4. We obtain a BLEU score of 13.06, which is a gain of 0.46 BLEU points over the baseline.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

translation quality

Appears in 4 sentences as: translation quality (4)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. (2006) use syntactic annotations on the source language side and show significant improvements in translation quality .
    Page 1, “Introduction”
  2. (2009), and Chiang (2010), the integration of syntactic information on both sides tends to decrease translation quality because the systems become too restrictive.
    Page 1, “Introduction”
  3. The translation quality is automatically measured using BLEU scores, and we confirm the findings by providing linguistic evidence (see Section 5).
    Page 2, “Introduction”
  4. We measured the overall translation quality with the help of 4-gram BLEU (Papineni et al., 2002), which was computed on tokenized and lower-cased data for both systems.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

translation system

Appears in 4 sentences as: translation system (3) translation systems (1)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. Besides phrase-based machine translation systems (Koehn et al., 2003), syntax-based systems have become widely used because of their ability to handle nonlocal reordering.
    Page 1, “Introduction”
  2. In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model.
    Page 1, “Introduction”
  3. Our contrastive system is the 6MBOT—based translation system presented here.
    Page 7, “Experiments”
  4. We demonstrated that our EMBOT-based machine translation system beats a standard tree-to-tree system (Moses tree-to-tree) on the WMT 2009 translation task English —> German.
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2013 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

translation task

Appears in 4 sentences as: translation task (4)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. We perform a large-scale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English —> German translation task .
    Page 1, “Abstract”
  2. We evaluate our new system on the WMT 2009 shared translation task English —> German.
    Page 2, “Introduction”
  3. The compared systems are evaluated on the English-to-German13 news translation task of WMT 2009 (Callison-Burch et al., 2009).
    Page 7, “Experiments”
  4. We demonstrated that our EMBOT-based machine translation system beats a standard tree-to-tree system (Moses tree-to-tree) on the WMT 2009 translation task English —> German.
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2013 that mention translation task.

See all papers in Proc. ACL that mention translation task.

Back to top.

parse tree

Appears in 3 sentences as: parse tree (2) parse trees (1)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. Since we utilize syntactic parse trees , let us introduce trees first.
    Page 2, “Theoretical Model”
  2. 8Actually, t must embed in the parse tree of 6; see Section 4.
    Page 4, “Translation Model”
  3. 12Theoretically, this allows that the decoder ignores unary parser nonterminals, which could also disappear when we make our rules shallow; e. g., the parse tree left in the pre-translation of Figure 5 can be matched by a rule with left-hand side NP(Official, forecasts).
    Page 6, “Decoding”

See all papers in Proc. ACL 2013 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

sentence pairs

Appears in 3 sentences as: sentence pair (1) sentence pairs (2)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. In this manner we obtain sentence pairs like the one shown in Figure 3.
    Page 3, “Theoretical Model”
  2. To these sentence pairs we apply the rule extraction method of Maletti (2011).
    Page 3, “Theoretical Model”
  3. The rules extracted from the sentence pair of Figure 3 are shown in Figure 4.
    Page 3, “Theoretical Model”

See all papers in Proc. ACL 2013 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

statistically significant

Appears in 3 sentences as: statistically significant (3)
In Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
  1. The starred results are statistically significant improvements over the Baseline (at confidence p < 0.05).
    Page 7, “Experiments”
  2. This improvement is statistically significant at confidence p < 0.05, which we computed using the pairwise bootstrap resampling technique of Koehn (2004).
    Page 7, “Experiments”
  3. However this improvement (0.34) is not statistically significant .
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.