Model-Based Aligner Combination Using Dual Decomposition
DeNero, John and Macherey, Klaus

Article Structure

Abstract

Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation 6.

Introduction

Word alignment is the task of identifying corresponding words in sentence pairs.

Model Definition

Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values.

Model Inference

In general, graphical models admit efficient, exact inference algorithms if they do not contain cycles.

Related Work

Alignment combination normally involves selecting some A from the output of two directional models.

Experimental Results

We evaluated our bidirectional model by comparing its output to the annotations of a hand-aligned corpus.

Conclusion

We have presented a graphical model that combines two classical HMM-based alignment models.

Topics

alignment models

Appears in 15 sentences as: Alignment Model (2) alignment model (4) alignment models (9)
In Model-Based Aligner Combination Using Dual Decomposition
  1. This result is achieved by embedding two directional HMM-based alignment models into a larger bidirectional graphical model.
    Page 1, “Introduction”
  2. Moreover, the bidirectional model enforces a one-to-one phrase alignment structure, similar to the output of phrase alignment models (Marcu and Wong, 2002; DeNero et al., 2008), unsupervised inversion transduction grammar (ITG) models (Blunsom et al., 2009), and supervised ITG models (Haghighi et al., 2009; DeNero and Klein, 2010).
    Page 1, “Introduction”
  3. Our model contains two directional hidden Markov alignment models , which we review in Section 2.1, along with additional structure that that we introduce in Section 2.2.
    Page 2, “Model Definition”
  4. 2.1 HMM-Based Alignment Model
    Page 2, “Model Definition”
  5. This section describes the classic hidden Markov model (HMM) based alignment model (Vogel et al., 1996).
    Page 2, “Model Definition”
  6. 2.2 A Bidirectional Alignment Model
    Page 2, “Model Definition”
  7. We can combine two HMM-based directional alignment models by embedding them in a larger model
    Page 2, “Model Definition”
  8. Because directional alignments are preserved intact as components of our model, extensions or refinements to the underlying directional Markov alignment model could be integrated cleanly into our model as well, including lexicalized transition models (He, 2007), extended conditioning contexts (Brunning et al., 2009), and external information (Shindo et al., 2010).
    Page 4, “Model Definition”
  9. In addition, supervised word alignment models often use the output of directional unsupervised aligners as features or pruning signals.
    Page 6, “Related Work”
  10. This approach to jointly learning two directional alignment models yields state-of-the-art unsupervised performance.
    Page 7, “Related Work”
  11. We trained the model on a portion of FBIS data that has been used previously for alignment model evaluation (Ayan and Dorr, 2006; Haghighi et al., 2009; DeNero and Klein, 2010).
    Page 7, “Experimental Results”

See all papers in Proc. ACL 2011 that mention alignment models.

See all papers in Proc. ACL that mention alignment models.

Back to top.

graphical model

Appears in 12 sentences as: graphical model (8) graphical models (4)
In Model-Based Aligner Combination Using Dual Decomposition
  1. This paper presents a graphical model that embeds two directional aligners into a single model.
    Page 1, “Abstract”
  2. This result is achieved by embedding two directional HMM-based alignment models into a larger bidirectional graphical model .
    Page 1, “Introduction”
  3. Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values.
    Page 2, “Model Definition”
  4. Figure l: The structure of our graphical model for a simple sentence pair.
    Page 3, “Model Definition”
  5. In general, graphical models admit efficient, exact inference algorithms if they do not contain cycles.
    Page 4, “Model Inference”
  6. While the entire graphical model has loops, there are two overlapping subgraphs that are cycle-free.
    Page 4, “Model Inference”
  7. To describe a dual decomposition inference procedure for our model, we first restate the inference problem under our graphical model in terms of the two overlapping subgraphs that admit tractable inference.
    Page 4, “Model Inference”
  8. In fact, we can make a stronger claim: we can reuse the Viterbi inference algorithm for linear chain graphical models that applies to the embedded directional HMM models.
    Page 5, “Model Inference”
  9. Hence, we can use a highly optimized linear chain inference implementation rather than a solver for general tree-structured graphical models .
    Page 5, “Model Inference”
  10. Although differing in both model and inference, our work and theirs both find improvements from defining graphical models for alignment that do not admit exact polynomial-time inference algorithms.
    Page 7, “Related Work”
  11. We have presented a graphical model that combines two classical HMM-based alignment models.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention graphical model.

See all papers in Proc. ACL that mention graphical model.

Back to top.

word alignment

Appears in 12 sentences as: word aligners (1) Word alignment (1) word alignment (9) words align (1)
In Model-Based Aligner Combination Using Dual Decomposition
  1. Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation 6.
    Page 1, “Abstract”
  2. Word alignment is the task of identifying corresponding words in sentence pairs.
    Page 1, “Introduction”
  3. The standard approach to word alignment employs directional Markov models that align the words of a sentence f to those of its translation 6, such as IBM Model 4 (Brown et al., 1993) or the HMM-based alignment rnodel(ngeletal,l996)
    Page 1, “Introduction”
  4. Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values.
    Page 2, “Model Definition”
  5. The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time.
    Page 2, “Model Definition”
  6. An alignment vector a can be converted trivially into a set of word alignment links A:
    Page 2, “Model Definition”
  7. However, all of the information about how words align is expressed by the vertex and edge potentials on a and b.
    Page 4, “Model Definition”
  8. In addition, supervised word alignment models often use the output of directional unsupervised aligners as features or pruning signals.
    Page 6, “Related Work”
  9. A parallel idea that closely relates to our bidirectional model is posterior regularization, which has also been applied to the word alignment problem (Graca et al., 2008).
    Page 7, “Related Work”
  10. Another similar line of work applies belief propagation to factor graphs that enforce a one-to-one word alignment (Cromieres and Kurohashi, 2009).
    Page 7, “Related Work”
  11. Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006).
    Page 8, “Experimental Results”

See all papers in Proc. ACL 2011 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

sentence pair

Appears in 8 sentences as: sentence pair (7) sentence pairs (1)
In Model-Based Aligner Combination Using Dual Decomposition
  1. Word alignment is the task of identifying corresponding words in sentence pairs .
    Page 1, “Introduction”
  2. Our bidirectional model Q = (12,13) is a globally normalized, undirected graphical model of the word alignment for a fixed sentence pair (6, f Each vertex in the vertex set V corresponds to a model variable Vi, and each undirected edge in the edge set D corresponds to a pair of variables (W, Each vertex has an associated potential function w, that assigns a real-valued potential to each possible value v,- of 16.1 Likewise, each edge has an associated potential function gig-(vi, 213-) that scores pairs of values.
    Page 2, “Model Definition”
  3. The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time.
    Page 2, “Model Definition”
  4. Figure l: The structure of our graphical model for a simple sentence pair .
    Page 3, “Model Definition”
  5. Because the word types and lengths of e and f are always fixed by the observed sentence pair , we can define our model only over a and b, where the edge potentials between any aj, fj, and e are compiled into a vertex potential function tug-a) on aj, defined in terms of f and e, and likewise for any (9,.
    Page 3, “Model Definition”
  6. These edges allow the model to ensure that the three sets of variables, a, b, and c, together encode a coherent alignment analysis of the sentence pair .
    Page 3, “Model Definition”
  7. Moreover, the value of u is specific to a sentence pair .
    Page 6, “Model Inference”
  8. Memory requirements are virtually identical to the baseline: only 11 must be stored for each sentence pair as it is being processed, but can then be immediately discarded once alignments are inferred.
    Page 6, “Model Inference”

See all papers in Proc. ACL 2011 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

phrase pairs

Appears in 6 sentences as: Phrase pair (1) phrase pair (1) phrase pairs (5)
In Model-Based Aligner Combination Using Dual Decomposition
  1. In this way, we can show that the bidirectional model improves alignment quality and enables the extraction of more correct phrase pairs .
    Page 7, “Experimental Results”
  2. Table 3: Phrase pair extraction accuracy for phrase pairs up to length 5.
    Page 8, “Experimental Results”
  3. Possible links are both included and excluded from phrase pairs during extraction, as in DeNero and Klein (2010).
    Page 8, “Experimental Results”
  4. Null aligned words are never included in phrase pairs for evaluation.
    Page 8, “Experimental Results”
  5. Correct phrase pair recall increases from 43.4% to 53.6% (a 23.5% relative increase) for the bidirectional model, relative to the best baseline.
    Page 8, “Experimental Results”
  6. The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention phrase pairs.

See all papers in Proc. ACL that mention phrase pairs.

Back to top.

Viterbi

Appears in 6 sentences as: Viterbi (6)
In Model-Based Aligner Combination Using Dual Decomposition
  1. The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time.
    Page 2, “Model Definition”
  2. In fact, we can make a stronger claim: we can reuse the Viterbi inference algorithm for linear chain graphical models that applies to the embedded directional HMM models.
    Page 5, “Model Inference”
  3. Note that Equation 5 and f are sums of terms in log space, while Viterbi inference for linear chains assumes a product of terms in probability space, which introduces the exponentiation above.
    Page 5, “Model Inference”
  4. Our method is complementary to agreement-based learning, as it applies to Viterbi inference under the model rather than computing expectations.
    Page 7, “Related Work”
  5. First, we iterate over Viterbi predictions rather than posteriors.
    Page 7, “Related Work”
  6. Also, our model training is identical to the HMM-based baseline training, while they employ belief propagation for both training and Viterbi inference.
    Page 7, “Related Work”

See all papers in Proc. ACL 2011 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

machine translation

Appears in 5 sentences as: Machine translation (1) machine translation (4)
In Model-Based Aligner Combination Using Dual Decomposition
  1. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final.
    Page 1, “Abstract”
  2. Machine translation systems typically combine the predictions of two directional models, one which aligns f to e and the other e to f (Och et al., 1999).
    Page 1, “Introduction”
  3. Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006).
    Page 8, “Experimental Results”
  4. Finally, we evaluated our bidirectional model in a large-scale end-to-end phrase-based machine translation system from Chinese to English, based on the alignment template approach (Och and Ney, 2004).
    Page 8, “Experimental Results”
  5. We also look forward to discovering the best way to take advantage of these new alignments in downstream applications like machine translation , supervised word alignment, bilingual parsing (Burkett et al., 2010), part-of-speech tag induction (Naseem et al., 2009), or cross-lingual model projection (Smith and Eisner, 2009; Das and Petrov, 2011).
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

Chinese-English

Appears in 4 sentences as: Chinese-English (4)
In Model-Based Aligner Combination Using Dual Decomposition
  1. Our model-based approach to aligner combination yields improvements in alignment quality and phrase extraction quality in Chinese-English experiments, relative to typical heuristic combinations methods applied to the predictions of independent directional models.
    Page 1, “Introduction”
  2. We evaluated alignment quality on a hand-aligned portion of the NIST 2002 Chinese-English test set (Ayan and Dorr, 2006).
    Page 7, “Experimental Results”
  3. AER results for Chinese-English are reported in Table 2.
    Page 8, “Experimental Results”
  4. The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

translation systems

Appears in 4 sentences as: translation system (1) translation systems (3)
In Model-Based Aligner Combination Using Dual Decomposition
  1. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final.
    Page 1, “Abstract”
  2. Machine translation systems typically combine the predictions of two directional models, one which aligns f to e and the other e to f (Och et al., 1999).
    Page 1, “Introduction”
  3. Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006).
    Page 8, “Experimental Results”
  4. Finally, we evaluated our bidirectional model in a large-scale end-to-end phrase-based machine translation system from Chinese to English, based on the alignment template approach (Och and Ney, 2004).
    Page 8, “Experimental Results”

See all papers in Proc. ACL 2011 that mention translation systems.

See all papers in Proc. ACL that mention translation systems.

Back to top.

error rate

Appears in 3 sentences as: error rate (3)
In Model-Based Aligner Combination Using Dual Decomposition
  1. Table 2: Alignment error rate results for the bidirectional model versus the baseline directional models.
    Page 8, “Experimental Results”
  2. First, we measure alignment error rate (AER), which compares the pro-
    Page 8, “Experimental Results”
  3. The translation model weights were tuned for both the baseline and bidirectional alignments using lattice-based minimum error rate training (Kumar et al., 2009).
    Page 8, “Experimental Results”

See all papers in Proc. ACL 2011 that mention error rate.

See all papers in Proc. ACL that mention error rate.

Back to top.

iteratively

Appears in 3 sentences as: iteratively (3)
In Model-Based Aligner Combination Using Dual Decomposition
  1. In this approach, we iteratively apply the same efficient sequence algorithms for the underlying directional models, and thereby optimize a dual bound on the model objective.
    Page 1, “Introduction”
  2. In particular, we can iteratively apply exact inference to the subgraph problems, adjusting their potentials to reflect the constraints of the full problem.
    Page 4, “Model Inference”
  3. We can iteratively search for such a 11 via sub-gradient descent.
    Page 6, “Model Inference”

See all papers in Proc. ACL 2011 that mention iteratively.

See all papers in Proc. ACL that mention iteratively.

Back to top.

optimization problem

Appears in 3 sentences as: optimization problem (3)
In Model-Based Aligner Combination Using Dual Decomposition
  1. The Lagrangian relaxation of this optimization problem is L(a, b, C(a), (:00), u) =
    Page 4, “Model Inference”
  2. We can form a dual problem that is an upper bound on the original optimization problem by swapping the order of min and max.
    Page 4, “Model Inference”
  3. Hence, it is also a solution to our original optimization problem : Equation 3.
    Page 6, “Model Inference”

See all papers in Proc. ACL 2011 that mention optimization problem.

See all papers in Proc. ACL that mention optimization problem.

Back to top.

precision and recall

Appears in 3 sentences as: precision and recall (3)
In Model-Based Aligner Combination Using Dual Decomposition
  1. The bidirectional model improves both precision and recall relative to all heuristic combination techniques, including grow-diag-final (Koehn et al., 2003).
    Page 8, “Experimental Results”
  2. As our model only provides small improvements in alignment precision and recall for the union combiner, the magnitude of the BLEU improvement is not surprising.
    Page 9, “Experimental Results”
  3. The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.