SciSurf: Index of 'Exact Maximum Inference for the Fertility Hidden Markov Model'

Topics

Viterbi (7)
word alignment (7)
dynamic programming (4)
Gibbs sampling (3)
objective function (3)

Topics

Viterbi (7)
word alignment (7)
dynamic programming (4)
Gibbs sampling (3)
objective function (3)

Exact Maximum Inference for the Fertility Hidden Markov Model

Quirk, Chris

Published in Proc. ACL, 2013

Article Structure

Abstract

The notion of fertility in word alignment (the number of words emitted by a single state) is useful but difficult to model.

Introduction

Word-based translation models intended to model the translation process have found new uses identifying word correspondences in sentence pairs.

HMM alignment

Let us briefly review the HMM translation model as a starting point.

Fertility distribution parameters

Original IBM models used a categorical distribution of fertility, one such distribution for each English word.

Evaluation

We explore the impact of this improved MAP inference procedure on a task in German-English word alignment.

Conclusions and future work

We have introduced a dual decomposition approach to alignment inference that substantially reduces alignment error.

Topics

Viterbi

Appears in 7 sentences as: Viterbi (7)

In Exact Maximum Inference for the Fertility Hidden Markov Model

Therefore, the standard forward-backward and Viterbi algorithms do not apply.
Page 1, “Introduction”
argmaXa (f(a) + 2., u<k—1>(i,j>ya(i,j>) ing the standard Viterbi algorithm.
Page 3, “HMM alignment”
The algorithm described in Figure 2 finds this maximal assignment in 0(I J log J) time, generally faster than the CU2 J) time used by Viterbi .
Page 3, “HMM alignment”
Algorithm AER (G—>E) AER (E—>G) HMM 24.0 21.8 FHMM Viterbi 19.7 19.6 FHMM Dual-dec 18.0 17.4
Page 4, “Fertility distribution parameters”
The next line shows the fertility HMM with approximate posterior computation from Gibbs sampling but with final alignment selected by the Viterbi algorithm.
Page 4, “Evaluation”
The prior work compared Viterbi with a form of local search (sampling repeatedly and keeping the max), finding little difference between the two (Zhao and Gildea, 2010).
Page 4, “Evaluation”
Here, however, the difference between a dual decomposition and Viterbi is significant: their results were likely due to search error.
Page 4, “Evaluation”

See all papers in Proc. ACL 2013 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

word alignment

Appears in 7 sentences as: word aligned (1) word alignment (3) word alignments (3)

In Exact Maximum Inference for the Fertility Hidden Markov Model

The notion of fertility in word alignment (the number of words emitted by a single state) is useful but difficult to model.
Page 1, “Abstract”
These word alignments are a crucial training component in most machine translation systems.
Page 1, “Introduction”
2 and 3 incorporate a positional model based on the absolute position of the word; Models 4 and 5 use a relative position model instead (an English word tends to align to a French word that is nearby the French word aligned to the previous English word).
Page 1, “Introduction”
, f J and word alignment vectors a = a1, .
Page 2, “HMM alignment”
For the standard HMM, there is a dynamic programming algorithm to compute the posterior probability over word alignments Pr(a|e, f These are the sufficient statistics gathered in the E step of EM.
Page 2, “HMM alignment”
We explore the impact of this improved MAP inference procedure on a task in German-English word alignment .
Page 4, “Evaluation”
For training data we use the news commentary data from the WMT 2012 translation task.1 120 of the training sentences were manually annotated with word alignments .
Page 4, “Evaluation”

See all papers in Proc. ACL 2013 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

dynamic programming

Appears in 4 sentences as: dynamic programming (4)

In Exact Maximum Inference for the Fertility Hidden Markov Model

For the standard HMM, there is a dynamic programming algorithm to compute the posterior probability over word alignments Pr(a|e, f These are the sufficient statistics gathered in the E step of EM.
Page 2, “HMM alignment”
The structure of the fertility model violates the Markov assumptions used in this dynamic programming method.
Page 2, “HMM alignment”
Rather than maximizing each row totally independently, we keep track of the best configurations for each number of words generated in each row, and then pick the best combination that sums to J: another straightforward exercise in dynamic programming .
Page 4, “HMM alignment”
The first line is a baseline HMM using exact posterior computation and inference with the standard dynamic programming algorithms.
Page 4, “Evaluation”

See all papers in Proc. ACL 2013 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

Gibbs sampling

Appears in 3 sentences as: Gibbs sampling (3)

In Exact Maximum Inference for the Fertility Hidden Markov Model

Recent approaches instead use more principled approximate inference techniques such as Gibbs sampling for parameter estimation.
Page 1, “Abstract”
estimate the posterior distribution using Markov chain Monte Carlo methods such as Gibbs sampling (Zhao and Gildea, 2010).
Page 2, “HMM alignment”
The next line shows the fertility HMM with approximate posterior computation from Gibbs sampling but with final alignment selected by the Viterbi algorithm.
Page 4, “Evaluation”

See all papers in Proc. ACL 2013 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

objective function

Appears in 3 sentences as: objective function (3)

In Exact Maximum Inference for the Fertility Hidden Markov Model

This captures the positional information in the IBM models in a framework that admits exact parameter estimation inference, though the objective function is not concave: local maxima are a concern.
Page 1, “Introduction”
Let us rewrite the objective function as follows:
Page 2, “HMM alignment”
Note how this recovers the original objective function when matching variables are found.
Page 3, “HMM alignment”

See all papers in Proc. ACL 2013 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.