Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
Zhang, Hao and Gildea, Daniel

Article Structure

Abstract

We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.

Introduction

Statistical machine translation systems based on synchronous grammars have recently shown great promise, but one stumbling block to their widespread adoption is that the decoding, or search, problem during translation is more computationally demanding than in phrase-based systems.

Language Model Integrated Decoding for SCFG

We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.

Multi-pass LM-Integrated Decoding

In this section, we describe a multi-pass progressive decoding technique that gradually augments the LM-integrated states from lower orders to higher orders.

Decoding to Maximize BLEU

The ultimate goal of efficient decoding to find the translation that has a highest evaluation score using the least time possible.

Experiments

We did our decoding experiments on the LDC 2002 MT evaluation data set for translation of Chinese newswire sentences into English.

Conclusion

We present a multi-pass method to speed up n-gram integrated decoding for SCFG.

Topics

BLEU

Appears in 19 sentences as: BLEU (20)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. An additional fast decoding pass maximizing the expected count of correct translation hypotheses increases the BLEU score significantly.
    Page 1, “Abstract”
  2. With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder.
    Page 1, “Introduction”
  3. Maximizing the expected count of synchronous constituents approximately maximizes BLEU .
    Page 1, “Introduction”
  4. We find a significant increase in BLEU in the experiments, with minimal additional time.
    Page 1, “Introduction”
  5. BLEU is based on n-gram precision, and since each synchronous constituent in the tree adds a new 4-gram to the translation at the point where its children are concatenated, the additional pass approximately maximizes BLEU .
    Page 5, “Decoding to Maximize BLEU”
  6. We evaluate the translation results by comparing them against the reference translations using the BLEU metric.
    Page 5, “Experiments”
  7. Hyperedges BLEU Bigram Pass 167K 21.77 Trigram Pass UNI — —BO + 629.7K=796.7K 23.56 BO+BB +2.7K =169.
    Page 6, “Experiments”
  8. Fable 1: Speed and BLEU scores for two-pass decoding.
    Page 6, “Experiments”
  9. However, model scores do not directly translate into BLEU scores.
    Page 7, “Experiments”
  10. 5.4 Maximizing BLEU
    Page 7, “Experiments”
  11. In order to maximize BLEU score using the algorithm described in Section 4, we need a sizable trigram forest as a starting point.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

bigram

Appears in 15 sentences as: Bigram (1) bigram (16)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
    Page 1, “Abstract”
  2. The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder, but takes time that is insignificant in comparison to the bigram pass.
    Page 1, “Abstract”
  3. First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model, and the second pass expands these states into trigram-based
    Page 1, “Introduction”
  4. With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder.
    Page 1, “Introduction”
  5. More specifically, a bigram decoding pass is executed forward and backward to figure out the probability of each state.
    Page 3, “Multi-pass LM-Integrated Decoding”
  6. We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
    Page 3, “Multi-pass LM-Integrated Decoding”
  7. If we can afford a bigram decoding pass, the outside cost from a bigram model is conceivably a
    Page 3, “Multi-pass LM-Integrated Decoding”
  8. very good estimate of the outside cost using a trigram model since a bigram language model and a trigram language model must be strongly correlated.
    Page 3, “Multi-pass LM-Integrated Decoding”
  9. The benefit of including the best-border estimate is to refine the outside estimate with respect to the inner words which refine the bigram states into the trigram states.
    Page 4, “Multi-pass LM-Integrated Decoding”
  10. The outside-pass Algorithm 1 for bigram decoding can be generalized to the trigram case.
    Page 5, “Decoding to Maximize BLEU”
  11. Hyperedges BLEU Bigram Pass 167K 21.77 Trigram Pass UNI — —BO + 629.7K=796.7K 23.56 BO+BB +2.7K =169.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

language model

Appears in 12 sentences as: language model (14) language models (1)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model .
    Page 1, “Abstract”
  2. This complexity arises from the interaction of the tree-based translation model with an n-gram language model .
    Page 1, “Introduction”
  3. First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model , and the second pass expands these states into trigram-based
    Page 1, “Introduction”
  4. The general bigram-to-trigram technique is common in speech recognition (Murveit et al., 1993), where lattices from a bigram-based decoder are re-scored with a trigram language model .
    Page 1, “Introduction”
  5. Second, we explore heuristics for agenda-based search, and present a heuristic for our second pass that combines precomputed language model information with information derived from the first pass.
    Page 1, “Introduction”
  6. We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.
    Page 1, “Language Model Integrated Decoding for SCFG”
  7. Without an n-gram language model , decoding using SCFG is not much different from CFG parsing.
    Page 2, “Language Model Integrated Decoding for SCFG”
  8. However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized.
    Page 2, “Language Model Integrated Decoding for SCFG”
  9. very good estimate of the outside cost using a trigram model since a bigram language model and a trigram language model must be strongly correlated.
    Page 3, “Multi-pass LM-Integrated Decoding”
  10. We propagate the outside cost of the parent to its children by combining with the inside cost of the other children and the interaction cost, i.e., the language model cost between the focused child and the other children.
    Page 3, “Multi-pass LM-Integrated Decoding”
  11. (2007) also take a two-pass decoding approach, with the first pass leaving the language model boundary words out of the dynamic programming state, such that only one hypothesis is retained for each span and grammar symbol.
    Page 4, “Multi-pass LM-Integrated Decoding”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

n-gram

Appears in 12 sentences as: n-gram (13)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
    Page 1, “Abstract”
  2. This complexity arises from the interaction of the tree-based translation model with an n-gram language model.
    Page 1, “Introduction”
  3. We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.
    Page 1, “Language Model Integrated Decoding for SCFG”
  4. Without an n-gram language model, decoding using SCFG is not much different from CFG parsing.
    Page 2, “Language Model Integrated Decoding for SCFG”
  5. However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized.
    Page 2, “Language Model Integrated Decoding for SCFG”
  6. We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
    Page 3, “Multi-pass LM-Integrated Decoding”
  7. We can make the approximation even closer by incorporating local higher-order outside n-gram information for a state of X[z',j, u1,,,,n_1, v1,,,,n_1] into account.
    Page 4, “Multi-pass LM-Integrated Decoding”
  8. where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration.
    Page 4, “Multi-pass LM-Integrated Decoding”
  9. The complexity of n-gram integrated decoding for SCFG has been tackled using other methods.
    Page 4, “Multi-pass LM-Integrated Decoding”
  10. (2005) factor-izes the dynamic programming steps and lowers the asymptotic complexity of the n-gram integrated decoding, but has not been implemented in large-scale systems where massive pruning is present.
    Page 4, “Multi-pass LM-Integrated Decoding”
  11. BLEU is based on n-gram precision, and since each synchronous constituent in the tree adds a new 4-gram to the translation at the point where its children are concatenated, the additional pass approximately maximizes BLEU.
    Page 5, “Decoding to Maximize BLEU”

See all papers in Proc. ACL 2008 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

BLEU score

Appears in 8 sentences as: BLEU score (5) BLEU scores (3)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. An additional fast decoding pass maximizing the expected count of correct translation hypotheses increases the BLEU score significantly.
    Page 1, “Abstract”
  2. With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder.
    Page 1, “Introduction”
  3. Fable 1: Speed and BLEU scores for two-pass decoding.
    Page 6, “Experiments”
  4. However, model scores do not directly translate into BLEU scores .
    Page 7, “Experiments”
  5. In order to maximize BLEU score using the algorithm described in Section 4, we need a sizable trigram forest as a starting point.
    Page 7, “Experiments”
  6. However, the mismatch between model score and BLEU score persists.
    Page 7, “Experiments”
  7. We achieve the record-high BLEU score 24.34 using on average 21 seconds per sentence, compared to the next-highest score of 23.92 achieved by cyk using on average 78 seconds per sentence.
    Page 7, “Experiments”
  8. This technique, together with the progressive search at previous stages, gives a decoder that produces the highest BLEU score we have obtained on the data in a very reasonable amount of time.
    Page 7, “Conclusion”

See all papers in Proc. ACL 2008 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

dynamic programming

Appears in 7 sentences as: dynamic programming (7)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. From a dynamic programming point of view, the DP states are X [i, j], where X ranges over all possible nonterminals and i and j range over 0 to the input string length Each state stores the best translations obtainable.
    Page 2, “Language Model Integrated Decoding for SCFG”
  2. Thus, to preserve the dynamic programming property, we need to refine the states by adding the boundary words into the parameterization.
    Page 2, “Language Model Integrated Decoding for SCFG”
  3. (2005) factor-izes the dynamic programming steps and lowers the asymptotic complexity of the n-gram integrated decoding, but has not been implemented in large-scale systems where massive pruning is present.
    Page 4, “Multi-pass LM-Integrated Decoding”
  4. (2007) also take a two-pass decoding approach, with the first pass leaving the language model boundary words out of the dynamic programming state, such that only one hypothesis is retained for each span and grammar symbol.
    Page 4, “Multi-pass LM-Integrated Decoding”
  5. Our algorithm is another dynamic programming decoding pass on the trigram forest, and is similar to the parsing algorithm for maximizing expected labelled recall presented by Goodman (1996).
    Page 5, “Decoding to Maximize BLEU”
  6. The expression can be factorized and computed using dynamic programming on the forest.
    Page 5, “Decoding to Maximize BLEU”
  7. We can also use the dynamic programming hook trick.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

Viterbi

Appears in 6 sentences as: Viterbi (8)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. Since we want to approximate the Viterbi
    Page 3, “Multi-pass LM-Integrated Decoding”
  2. where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration.
    Page 4, “Multi-pass LM-Integrated Decoding”
  3. We approximate 6 and 04 using the Viterbi probabilities.
    Page 5, “Decoding to Maximize BLEU”
  4. Since decoding from bottom up in the trigram pass already gives us the inside Viterbi scores, we only have to visit the nodes in the reverse order once we reach the root to compute the Viterbi outside scores.
    Page 5, “Decoding to Maximize BLEU”
  5. Simply by exploring more (200 times the log beam) after-goal items, we can optimize the Viterbi synchronous parse significantly, shown in Figure 3(left) in terms of model score versus search time.
    Page 7, “Experiments”
  6. We use an in-side/outside parsing algorithm to get the Viterbi outside cost of bigram integrated states which is used as an outside estimate for trigram integrated states.
    Page 7, “Conclusion”

See all papers in Proc. ACL 2008 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

model score

Appears in 4 sentences as: model score (3) model scores (1)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. This figure indicates the advantage of the two-pass decoding strategy in producing translations with a high model score in less time.
    Page 7, “Experiments”
  2. However, model scores do not directly translate into BLEU scores.
    Page 7, “Experiments”
  3. Simply by exploring more (200 times the log beam) after-goal items, we can optimize the Viterbi synchronous parse significantly, shown in Figure 3(left) in terms of model score versus search time.
    Page 7, “Experiments”
  4. However, the mismatch between model score and BLEU score persists.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention model score.

See all papers in Proc. ACL that mention model score.

Back to top.

parsing algorithm

Appears in 4 sentences as: parsing algorithm (4)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. Charniak and Johnson (2005) use a PCFG to do a pass of inside-outside parsing to reduce the state space of a subsequent lexicalized n-best parsing algorithm to produce parses that are further re-ranked by a MaxEnt model.
    Page 3, “Multi-pass LM-Integrated Decoding”
  2. Klein and Manning (2003) describe an A* parsing framework for monolingual parsing and admissible outside estimates that are computed using in-side/outside parsing algorithm on simplified PCFGs compared to the original PCFG.
    Page 3, “Multi-pass LM-Integrated Decoding”
  3. Our algorithm is another dynamic programming decoding pass on the trigram forest, and is similar to the parsing algorithm for maximizing expected labelled recall presented by Goodman (1996).
    Page 5, “Decoding to Maximize BLEU”
  4. We use an in-side/outside parsing algorithm to get the Viterbi outside cost of bigram integrated states which is used as an outside estimate for trigram integrated states.
    Page 7, “Conclusion”

See all papers in Proc. ACL 2008 that mention parsing algorithm.

See all papers in Proc. ACL that mention parsing algorithm.

Back to top.

machine translation

Appears in 3 sentences as: machine translation (3)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
    Page 1, “Abstract”
  2. Statistical machine translation systems based on synchronous grammars have recently shown great promise, but one stumbling block to their widespread adoption is that the decoding, or search, problem during translation is more computationally demanding than in phrase-based systems.
    Page 1, “Introduction”
  3. We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
    Page 1, “Introduction”

See all papers in Proc. ACL 2008 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

n-grams

Appears in 3 sentences as: n-grams (3)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. Use of longer n-grams improves translation results, but exacerbates this interaction.
    Page 1, “Introduction”
  2. We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
    Page 1, “Introduction”
  3. where 8(2', j) is the set of candidate target language words outside the span of (i, j h B B is the product of the upper bounds for the two on-the-border n-grams .
    Page 4, “Multi-pass LM-Integrated Decoding”

See all papers in Proc. ACL 2008 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

translation model

Appears in 3 sentences as: translation model (3)
In Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
  1. We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
    Page 1, “Abstract”
  2. This complexity arises from the interaction of the tree-based translation model with an n-gram language model.
    Page 1, “Introduction”
  3. The word-to-Word translation probabilities are from the translation model of IBM Model 4 trained on a 160-million-word English-Chinese parallel corpus using GIZA++.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.