A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Bidirectional models of word alignment are an appealing alternative to post-hoc combinations of directional word aligners.

Word alignment is a critical first step for building statistical machine translation systems.

The focus of this work is on the word alignment decoding problem.

The directional bias of the e—>f and f —>e alignment models may cause them to produce differing alignments.

Enforcing full agreement can be too strict an alignment criteria.

In general, it can be shown that Lagrangian relaxation is only guaranteed to solve a linear programming relaxation of the underlying combinatorial problem.

Reintroducing constraints can lead to an exponential blowup in the search space of the Viterbi algorithm.

The most common techniques for bidirectional alignment are post-hoc combinations, such as

Our experimental results compare the accuracy and optimality of our decoding algorithm to directional alignment models and previous work on this bidirectional model.

We have introduced a novel Lagrangian relaxation algorithm for a bidirectional alignment model that uses incremental constraint addition and coarse-to-fine pruning to find exact solutions.

Appears in 15 sentences as: word aligners (2) Word Alignment (1) Word alignment (1) word alignment (12) word alignments (1)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- Bidirectional models of word alignment are an appealing alternative to post-hoc combinations of directional word aligners .Page 1, “Abstract”
- Word alignment is a critical first step for building statistical machine translation systems.Page 1, “Introduction”
- In order to ensure accurate word alignments, most systems employ a post-hoc symmetrization step to combine directional word aligners , such as IBM Model 4 (Brown et al., 1993) or hidden Markov model (HMM) based aligners (Vogel et al., 1996).Page 1, “Introduction”
- We begin in Section 2 by formally describing the directional word alignment problem.Page 1, “Introduction”
- The focus of this work is on the word alignment decoding problem.Page 2, “Background”
- Before turning to the model of interest, we first introduce directional word alignment .Page 2, “Background”
- 2.1 Word AlignmentPage 2, “Background”
- In the e—>f word alignment problem, each word in e is aligned to a word in f or to the null word 6.Page 2, “Background”
- We refer to a single word alignment as a link.Page 2, “Background”
- Cromieres and Kurohashi (2009) use belief propagation on a factor graph to train and decode a one-to-one word alignment problem.Page 7, “Related Work”
- (2008) use posterior regularization to constrain the posterior probability of the word alignment problem to be symmetric and bij ective.Page 7, “Related Work”

See all papers in *Proc. ACL 2014* that mention word alignment.

See all papers in *Proc. ACL* that mention word alignment.

Back to top.

Appears in 14 sentences as: Viterbi (14)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- The relaxation can be solved with a modified version of the Viterbi algorithm.Page 1, “Abstract”
- Our decoder uses a simple variant of the Viterbi algorithm for solving a relaxed version of this model.Page 1, “Introduction”
- 0 It is based on a novel relaxation for the model of DeNero and Macherey (2011), solvable with a variant of the Viterbi algorithm.Page 1, “Introduction”
- Note that for both of these models we can solve the optimization problem exactly using the standard Viterbi algorithm for HMM decoding.Page 3, “Background”
- We then use the standard Viterbi update for computing the at variables, adding in the score of the y(i, j) necessary to satisfy the constraints.Page 4, “Bidirectional Alignment”
- The key algorithmic idea is to extend the Viterbi algorithm in order to consider possible adjacent links in the reverse direction.Page 4, “Adjacent Agreement”
- Despite the new constraints, we can still compute L()\) in 0(I J (I + J time using a variant of the Viterbi algorithm.Page 5, “Adjacent Agreement”
- Figure 5: Modified Viterbi algorithm for computing the adj a-cent agreement L()\).Page 5, “Adjacent Agreement”
- In order to compute the arg max of this optimization problem, we need to satisfy the constraints within the Viterbi algorithm.Page 6, “Adding Back Constraints”
- We augment the Viterbi chart with a count vector d E D where D C leplll and d(i,j) is a count for the (i,j)’th constraint, i.e.Page 6, “Adding Back Constraints”
- Figure 6 shows this constrained Viterbi relaxation approach.Page 6, “Adding Back Constraints”

See all papers in *Proc. ACL 2014* that mention Viterbi.

See all papers in *Proc. ACL* that mention Viterbi.

Back to top.

Appears in 11 sentences as: sentence pair (2) sentence pairs (9)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- The algorithm finds provably exact solutions on 86% of sentence pairs and shows improvements over directional models.Page 1, “Abstract”
- o Empirically, it is able to find exact solutions on 86% of sentence pairs and is significantly faster than general-purpose solvers.Page 1, “Introduction”
- Following past work, the first 150 sentence pairs of the training section are used for evaluation.Page 7, “Experiments”
- With incremental constraints and pruning, we are able to solve over 86% of sentence pairs including many longer and more difficult pairs.Page 8, “Experiments”
- Table 3: The average number of constraints added for sentence pairs where Lagrangian relaxation is not able to find an exact solution.Page 9, “Experiments”
- Table 3 gives the average number of constraints added for sentence pairs for which Lagrangian relaxation alone does not produce a certificate.Page 9, “Experiments”
- Figure 7(a) shows the average over all sentence pairs of the best dual and best primal scores.Page 9, “Experiments”
- On average, the pruning reduces the search space of each sentence pair to 20% of the initial search space after 200 iterations.Page 9, “Experiments”
- optimal score, averaged over all sentence pairs .Page 9, “Conclusion”
- The plot shows an average over all sentence pairs .Page 9, “Conclusion”
- We can construct a sentence pair in which I = J = N and e-alignments have infinite cost.Page 9, “Conclusion”

See all papers in *Proc. ACL 2014* that mention sentence pairs.

See all papers in *Proc. ACL* that mention sentence pairs.

Back to top.

Appears in 5 sentences as: alignment model (3) alignment models (2)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- A first-order HMM alignment model (Vogel et al., 1996) is an HMM of length I + 1 where the hidden state at position i E [I ]0 is the aligned index j E [J ]0, and the transition score takes into account the previously aligned index 3" E [J ]0.1 Formally, define the set of possible HMM alignments as X C {0,1}([I]0X[J]0)U([I]X[J]0X[J]0) withPage 2, “Background”
- The directional bias of the e—>f and f —>e alignment models may cause them to produce differing alignments.Page 3, “Bidirectional Alignment”
- In this work, we instead consider a bidirectional alignment model that jointly considers both directional models.Page 3, “Bidirectional Alignment”
- Our experimental results compare the accuracy and optimality of our decoding algorithm to directional alignment models and previous work on this bidirectional model.Page 7, “Experiments”
- We have introduced a novel Lagrangian relaxation algorithm for a bidirectional alignment model that uses incremental constraint addition and coarse-to-fine pruning to find exact solutions.Page 9, “Conclusion”

See all papers in *Proc. ACL 2014* that mention alignment model.

See all papers in *Proc. ACL* that mention alignment model.

Back to top.

Appears in 3 sentences as: linear programming (3)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- In general, it can be shown that Lagrangian relaxation is only guaranteed to solve a linear programming relaxation of the underlying combinatorial problem.Page 5, “Adding Back Constraints”
- General linear programming approaches have also been applied to word alignment problems.Page 7, “Related Work”
- (2006) formulate the word alignment problem as quadratic assignment problem and solve it using an integer linear programming solver.Page 7, “Related Work”

See all papers in *Proc. ACL 2014* that mention linear programming.

See all papers in *Proc. ACL* that mention linear programming.

Back to top.

Appears in 3 sentences as: objective function (3)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- Given a sentence e of length |e| = I and a sentence f of length |f| = J, our goal is to find the best bidirectional alignment between the two sentences under a given objective function .Page 2, “Background”
- The HMM objective function f : X —> R can be written as a linear function of :cPage 2, “Background”
- Similarly define the objective functionPage 2, “Background”

See all papers in *Proc. ACL 2014* that mention objective function.

See all papers in *Proc. ACL* that mention objective function.

Back to top.

Appears in 3 sentences as: phrase pair (3)

In *A Constrained Viterbi Relaxation for Bidirectional Word Alignment*

- alignment phrase pairPage 8, “Experiments”
- Table 2: Alignment accuracy and phrase pair extraction accuracy for directional and bidirectional models.Page 8, “Experiments”
- AER is alignment error rate and F l is the phrase pair extraction F1 score.Page 8, “Experiments”

See all papers in *Proc. ACL 2014* that mention phrase pair.

See all papers in *Proc. ACL* that mention phrase pair.

Back to top.