Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
Rush, Alexander M. and Collins, Michael

Article Structure

Abstract

We describe an exact decoding algorithm for syntax-based statistical translation.

Introduction

Recent work has seen widespread use of synchronous probabilistic grammars in statistical machine translation (SMT).

Related Work

A variety of approximate decoding algorithms have been explored for syntax-based translation systems, including cube-pruning (Chiang, 2007; Huang and Chiang, 2007), left-to-right decoding with beam search (Watanabe et al., 2006; Huang and Mi, 2010), and coarse-to-fine methods (Petrov et al., 2008).

Background: Hypergraphs

Translation with many syntax-based systems (e.g., (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008; Huang and Mi, 2010)) can be implemented as a two-step process.

A Simple Lagrangian Relaxation Algorithm

We now give a Lagrangian relaxation algorithm for integration of a hypergraph with a bigram language model, in cases where the hypergraph satisfies the following simplifying assumption:

The Full Algorithm

We now describe our full algorithm, which does not require the strict ordering constraint.

Tightening the Relaxation

The algorithm that we have described minimizes the dual function L()\, 7, u, 2)).

Experiments

We report experiments on translation from Chinese to English, using the tree-to-string model described

Conclusion

We have described a Lagrangian relaxation algorithm for exact decoding of syntactic translation models, and shown that it is significantly more efficient than other exact algorithms for decoding tree-

Topics

language model

Appears in 19 sentences as: language model (18) language model: (1) language models (2)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. The language model is then uwdwmmmemMMmhmmMnmmwmmmmr Decoding with these models is challenging, largely because of the cost of integrating an n-gram language model into the search process.
    Page 1, “Introduction”
  2. 2E.g., with a trigram language model they run in O(\E\w6) time, where is the number of edges in the hypergraph, and w is the number of distinct lexical items in the hypergraph.
    Page 1, “Introduction”
  3. This step does not require language model integration, and hence is highly efficient.
    Page 1, “Introduction”
  4. Informally, the first decoding algorithm incorporates the weights and hard constraints on translations from the synchronous grammar, while the second decoding algorithm is used to integrate language model scores.
    Page 1, “Introduction”
  5. The second step is to integrate an n-gram language model with this hypergraph.
    Page 2, “Background: Hypergraphs”
  6. The labels for leaves will be words, and will be important in defining strings and language model scores for those strings.
    Page 3, “Background: Hypergraphs”
  7. The focus of this paper will be to solve problems involving the integration of a k’th order language model with a hypergraph.
    Page 3, “Background: Hypergraphs”
  8. ,vi) parameters score n-grams of length k. These parameters are typically defined by a language model , for example with k = 3 we would have 9(vi—2,vz—1,vz‘) = 10gp(l(vi)ll(vi—2)a“vi—1))-The problem is then to find y* = arg maxyey f under this definition.
    Page 3, “Background: Hypergraphs”
  9. Throughout this paper we make the following assumption when using a bigram language model:
    Page 3, “Background: Hypergraphs”
  10. For any derivation y, with leaves 8(3)) 2 v1,v2,...,vn, it is the case that: (1) v1 = 2 and on = 3; (2) the leaves 2 and 3 cannot appear at any other position in the strings 8(3)) for y E y; (3) [(2) = <s> where <s> is the start symbol in the language model ; (4) [(3) = </ s> where < / s> is the end symbol.
    Page 3, “Background: Hypergraphs”
  11. This assumption allows us to incorporate language model terms that depend on the start and end symbols.
    Page 3, “Background: Hypergraphs”

See all papers in Proc. ACL 2011 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

dynamic programming

Appears in 14 sentences as: Dynamic programming (1) dynamic programming (14)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable sub-problems, thereby avoiding exhaustive dynamic programming .
    Page 1, “Abstract”
  2. Exact dynamic programming algorithms for the problem are well known (Bar-Hillel et al., 1964), but are too expensive to be used in practice.2 Previous work on decoding for syntax-based SMT has therefore been focused primarily on approximate search methods.
    Page 1, “Introduction”
  3. Dynamic programming over the weighted hypergraph.
    Page 1, “Introduction”
  4. We do this by gradually introducing constraints to step 1 ( dynamic programming over the hypergraph), while still maintaining efficiency.
    Page 1, “Introduction”
  5. (2) find the highest scoring derivation using dynamic programming
    Page 3, “A Simple Lagrangian Relaxation Algorithm”
  6. Steps 1 and 2 can be performed efficiently; in particular, we avoid the classical dynamic programming intersection, instead relying on dynamic programming over the original, simple hypergraph.
    Page 4, “A Simple Lagrangian Relaxation Algorithm”
  7. 0 Using dynamic programming , find values for the yv and ye variables that form a valid derivation, and that maximize
    Page 5, “A Simple Lagrangian Relaxation Algorithm”
  8. In the second step we use conventional dynamic programming over the hypergraph to find an optimal derivation that incorporates weights from the first step.
    Page 5, “A Simple Lagrangian Relaxation Algorithm”
  9. The second step involves simple dynamic programming over the hypergraph (V, E) (it is simple to integrate the 68 terms into this algorithm).
    Page 7, “The Full Algorithm”
  10. The main steps of the algorithm are: 1) construction of the graph (8, T); 2) at each iteration, dynamic programming over the hypergraph (V, E); 3) at each iteration, all-pairs shortest path algorithms over the graph (8, T).
    Page 7, “The Full Algorithm”
  11. steps—hypergraph dynamic programming , and all-pairs shortest path—are widely known algorithms that are simple to implement.
    Page 7, “The Full Algorithm”

See all papers in Proc. ACL 2011 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

bigram

Appears in 9 sentences as: Bigram (1) bigram (8) bigrams (3)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. Throughout this paper we make the following assumption when using a bigram language model:
    Page 3, “Background: Hypergraphs”
  2. Assumption 3.1 ( Bigram start/end assumption.)
    Page 3, “Background: Hypergraphs”
  3. We now give a Lagrangian relaxation algorithm for integration of a hypergraph with a bigram language model, in cases where the hypergraph satisfies the following simplifying assumption:
    Page 3, “A Simple Lagrangian Relaxation Algorithm”
  4. over the original (non-intersected) hypergraph, with leaf nodes having weights 6v + 0% + (3) If the output derivation from step 2 has the same set of bigrams as those from step 1, then we have an exact solution to the problem.
    Page 4, “A Simple Lagrangian Relaxation Algorithm”
  5. C1 states that each leaf in a derivation has exactly one incoming bigram, and that each leaf not in the derivation has 0 incoming bigrams; C2 states that each leaf in a derivation has exactly one outgoing bigram , and that each leaf not in the derivation has 0 outgoing bigrams.6
    Page 4, “A Simple Lagrangian Relaxation Algorithm”
  6. 6Recall that according to the bigram start/end assumption the leaves 2/3 are reserved for the start/end of the sequence 3(y), and hence do not have an incoming/outgoing bigram .
    Page 4, “A Simple Lagrangian Relaxation Algorithm”
  7. In the first step we compute the highest scoring incoming bigram for each leaf 2).
    Page 5, “A Simple Lagrangian Relaxation Algorithm”
  8. The constraints are all satisfied in this case, but the bigram variables are invalid (e. g., they contain a cycle).
    Page 5, “A Simple Lagrangian Relaxation Algorithm”
  9. The set ’P of trigram paths plays an analogous role to the set 23 of bigrams in our previous algorithm.
    Page 6, “The Full Algorithm”

See all papers in Proc. ACL 2011 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

linear programming

Appears in 5 sentences as: linear programming (6)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. The dual corresponds to a particular linear programming (LP) relaxation of the original decoding problem.
    Page 1, “Introduction”
  2. There are close connections between Lagrangian relaxation and linear programming relaxations.
    Page 5, “A Simple Lagrangian Relaxation Algorithm”
  3. LR = Lagrangian relaxation; DP = exhaustive dynamic programming; ILP = integer linear programming; LP = linear programming (LP does not recover an exact solution).
    Page 8, “Experiments”
  4. Figure 5 gives information on decoding time for our method and two other exact decoding methods: integer linear programming (using constraints D0—D6), and exhaustive dynamic programming using the construction of (Bar-Hillel et al., 1964).
    Page 8, “Experiments”
  5. Figure 5 also gives a speed comparison of our method to a linear programming (LP) solver that solves the LP relaxation defined by constraints D0—D6.
    Page 9, “Experiments”

See all papers in Proc. ACL 2011 that mention linear programming.

See all papers in Proc. ACL that mention linear programming.

Back to top.

model scores

Appears in 5 sentences as: model score (1) model scores (4)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. Informally, the first decoding algorithm incorporates the weights and hard constraints on translations from the synchronous grammar, while the second decoding algorithm is used to integrate language model scores .
    Page 1, “Introduction”
  2. We compare our method to cube pruning (Chiang, 2007), and find that our method gives improved model scores on a significant number of examples.
    Page 2, “Introduction”
  3. The labels for leaves will be words, and will be important in defining strings and language model scores for those strings.
    Page 3, “Background: Hypergraphs”
  4. In the simple algorithm, the first step was to predict the previous leaf for each leaf 21, under a score that combined a language model score with a Lagrange multiplier score (i.e., compute arg Inan 6(711, 21) where 6011,21) = 6011,11) + In this section we describe an algorithm that for each leaf 21 again predicts the previous leaf, but in addition predicts the full path back to that leaf.
    Page 5, “The Full Algorithm”
  5. For each 12 E VL, define av = maxpm3(p)=v Mp), where MP) = (100109): 02(19):0309))—/\1(01(P))—/\2(v2(19))—Zsepl(p) — 25610209) Here h is a function that computes language model scores , and the other terms involve Lagrange mulipliers.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention model scores.

See all papers in Proc. ACL that mention model scores.

Back to top.

highest scoring

Appears in 4 sentences as: highest scoring (4)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. (2) find the highest scoring derivation using dynamic programming
    Page 3, “A Simple Lagrangian Relaxation Algorithm”
  2. In the first step we compute the highest scoring incoming bigram for each leaf 2).
    Page 5, “A Simple Lagrangian Relaxation Algorithm”
  3. (i.e., for each 2), compute the highest scoring trigram path ending in v.)
    Page 7, “The Full Algorithm”
  4. The first step involves finding the highest scoring incoming trigram path for each leaf 2).
    Page 7, “The Full Algorithm”

See all papers in Proc. ACL 2011 that mention highest scoring.

See all papers in Proc. ACL that mention highest scoring.

Back to top.

n-gram

Appears in 3 sentences as: n-gram (3)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. The decoding problem for a broad range of these systems (e.g., (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language model.1 The hypergraph represents a large set of possible translations, and is created by applying a synchronous grammar to the source language string.
    Page 1, “Introduction”
  2. The language model is then uwdwmmmemMMmhmmMnmmwmmmmr Decoding with these models is challenging, largely because of the cost of integrating an n-gram language model into the search process.
    Page 1, “Introduction”
  3. The second step is to integrate an n-gram language model with this hypergraph.
    Page 2, “Background: Hypergraphs”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

translation models

Appears in 3 sentences as: translation model (1) translation models (2)
In Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
  1. We use an identical model, and identical development and test data, to that used by Huang and Mi.9 The translation model is trained on 1.5M sentence pairs of Chinese-English data; a trigram language model is used.
    Page 8, “Experiments”
  2. We have described a Lagrangian relaxation algorithm for exact decoding of syntactic translation models , and shown that it is significantly more efficient than other exact algorithms for decoding tree-
    Page 9, “Conclusion”
  3. Our experiments have focused on tree-to-string models, but the method should also apply to Hiero-style syntactic translation models (Chiang, 2007).
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention translation models.

See all papers in Proc. ACL that mention translation models.

Back to top.