Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Wuebker, Joern and Ney, Hermann and Zens, Richard

Article Structure

Abstract

In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions.

Introduction

Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model (LM) querying (Heafield, 2011).

Search Algorithm Extensions

We apply the decoding algorithm described in (Zens and Ney, 2008).

Experimental Evaluation

3.1 Setup

Conclusions

This work introduces two extensions to the well-known beam search algorithm for phrase-based machine translation.

Topics

LM

Appears in 42 sentences as: LM (66)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model ( LM ) querying (Heafield, 2011).
    Page 1, “Introduction”
  2. We show that taking a heuristic LM score estimate for pre-sorting the
    Page 1, “Introduction”
  3. Further, we introduce two novel LM lookahead methods.
    Page 1, “Introduction”
  4. The idea of LM lookahead is to incorporate the LM probabilities into the pruning process of the beam search as early as possible.
    Page 1, “Introduction”
  5. First-word LM lookahead exploits the search structure to use the LM costs of the first word of a new phrase as a lower bound for the full LM costs of the phrase.
    Page 1, “Introduction”
  6. Phrase-only LM lookahead makes use of a precomputed estimate of the full LM costs for each phrase.
    Page 1, “Introduction”
  7. We detail the implementation of these methods and analyze their effect with respect to the number of LM computations and hypothesis expansions as well as on translation speed and quality.
    Page 1, “Introduction”
  8. In addition to sorting according to the purely phrase-intemal scores, which is common practice, we compute an estimate qLME(é) for the LM score of each target phrase é. qLME(é) is the weighted LM score we receive by assuming 5 to be a complete sentence without using sentence start and end markers.
    Page 2, “Search Algorithm Extensions”
  9. LM score computations are among the most expensive in decoding.
    Page 2, “Search Algorithm Extensions”
  10. (2006) report significant improvements in runtime by removing unnecessary LM lookups via early pruning.
    Page 2, “Search Algorithm Extensions”
  11. Here we describe an LM lookahead technique, which is aimed at further reducing the number of LM computations.
    Page 2, “Search Algorithm Extensions”

See all papers in Proc. ACL 2012 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

BLEU

Appears in 15 sentences as: BLEU (16)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. At a speed of roughly 70 words per second, Moses reaches 17.2% BLEU , whereas our approach yields 20.0% with identical models.
    Page 1, “Abstract”
  2. We also run comparisons with the Moses decoder (Koehn et al., 2007), which yields the same performance in BLEU , but is outperformed significantly in terms of scalability for faster translation.
    Page 1, “Introduction”
  3. system | BLEU [%] \ #HYP \ #LM \ w/s N0 2 oo baseline 20.1 3.0K 322K 2.2 +presort 20.1 2.5K 183K 3.6 N0 = 100
    Page 3, “Experimental Evaluation”
  4. We evaluate with BLEU (Papineni et al., 2002) and TER (Snover et al., 2006).
    Page 3, “Experimental Evaluation”
  5. BLEU [%]
    Page 3, “Experimental Evaluation”
  6. Figure 1: Translation performance in BLEU [%] on the newstest2009 set vs. speed on a logarithmic scale.
    Page 3, “Experimental Evaluation”
  7. When observation pruning is applied, we additionally observe a small increase by 0.2% in BLEU .
    Page 3, “Experimental Evaluation”
  8. The heuristic phrase-only LM lookahead method introduces additional search errors, resulting in a BLEU drop by 0.3%, but yields another 85% reduction in LM computations and increases throughput by a factor of 2.2.
    Page 3, “Experimental Evaluation”
  9. tion performance in BLEU is plotted against speed in Figure 1.
    Page 4, “Experimental Evaluation”
  10. Without the proposed extensions, Moses slightly outperforms our decoder in terms of BLEU .
    Page 4, “Experimental Evaluation”
  11. With LM score pre-sorting, the best BLEU value is similar to Moses while further accelerating translation, yielding identical performance at 16 words/sec as Moses at 1.8 words/sec.
    Page 4, “Experimental Evaluation”

See all papers in Proc. ACL 2012 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

beam search

Appears in 6 sentences as: beam search (6)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions.
    Page 1, “Abstract”
  2. The idea of LM lookahead is to incorporate the LM probabilities into the pruning process of the beam search as early as possible.
    Page 1, “Introduction”
  3. A beam search strategy is used to find the best hypothesis.
    Page 1, “Search Algorithm Extensions”
  4. In addition to the source sentence f1], the beam search algorithm takes a matrix E(-, as input, where for each contiguous phrase f = f j .
    Page 2, “Search Algorithm Extensions”
  5. In this section we evaluate the proposed extensions to the original beam search algorithm in terms of scalability and their usefulness for different application constraints.
    Page 3, “Experimental Evaluation”
  6. This work introduces two extensions to the well-known beam search algorithm for phrase-based machine translation.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention beam search.

See all papers in Proc. ACL that mention beam search.

Back to top.

language model

Appears in 6 sentences as: Language Model (2) language model (4)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions.
    Page 1, “Abstract”
  2. Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2.
    Page 1, “Abstract”
  3. Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model (LM) querying (Heafield, 2011).
    Page 1, “Introduction”
  4. ith Language Model LookAhead
    Page 1, “Introduction”
  5. 2.2 Language Model LookAhead
    Page 2, “Search Algorithm Extensions”
  6. The English language model is a 4-gram LM created with the SRILM toolkit (Stolcke, 2002) on all bilingual and parts of the provided monolingual data.
    Page 3, “Experimental Evaluation”

See all papers in Proc. ACL 2012 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

translation quality

Appears in 6 sentences as: translation quality (6)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2.
    Page 1, “Abstract”
  2. We compare our approach with Moses and observe the same performance, but a substantially better tradeoff between translation quality and speed.
    Page 1, “Abstract”
  3. phrase translation candidates has a positive effect on both translation quality and speed.
    Page 1, “Introduction”
  4. A better pre-selection can be expected to improve translation quality .
    Page 2, “Search Algorithm Extensions”
  5. It yields nearly the same top performance with an even better tradeoff between translation quality and speed.
    Page 4, “Experimental Evaluation”
  6. We compare our decoder to Moses, reaching a similar highest BLEU score, but clearly outperforming it in terms of scalability with respect to the tradeoff ratio between translation quality and speed.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

phrase table

Appears in 3 sentences as: phrase table (2) phrase tables (1)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. We use identical phrase tables and scaling factors for Moses and our decoder.
    Page 3, “Experimental Evaluation”
  2. The phrase table is pruned to a maximum of 400 target candidates per source phrase before decoding.
    Page 3, “Experimental Evaluation”
  3. The phrase table and LM are loaded into memory before translating and loading time is eliminated for speed measurements.
    Page 3, “Experimental Evaluation”

See all papers in Proc. ACL 2012 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

phrase-based

Appears in 3 sentences as: phrase-based (3)
In Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
  1. In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions.
    Page 1, “Abstract”
  2. Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model (LM) querying (Heafield, 2011).
    Page 1, “Introduction”
  3. This work introduces two extensions to the well-known beam search algorithm for phrase-based machine translation.
    Page 4, “Conclusions”

See all papers in Proc. ACL 2012 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.