A Branch-and-Bound Algorithm | The mixed integer quadratic program with nonlinear constraints, given in the previous section, maximizes the nonconvex Viterbi EM objective and is NP-hard to solve (Cohen and Smith, 2010). |
A Branch-and-Bound Algorithm | The standard approach to optimizing this program is local search by the hard ( Viterbi ) EM algorithm. |
Abstract | Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time. |
Introduction | Many parameter estimation techniques have been attempted, including expectation-maximization (EM) (Klein and Manning, 2004; Spitkovsky et al., 2010a), contrastive estimation (Smith and Eisner, 2006; Smith, 2006), Viterbi EM (Spitkovsky et al., 2010b), and variational EM (Naseem et al., 2010; Cohen et al., 2009; Cohen and Smith, 2009). |
Introduction | Note that this objective is that of hard ( Viterbi ) EM—we do not marginalize over the parses as in classical EM.1 |
Introduction | Our results demonstrate that our method can obtain higher likelihoods than Viterbi EM with random restarts. |
Projections | These correspond to the Viterbi E step and M step, respectively. |
Relaxations | Since we optimize the Viterbi EM objective, we directly constrain the counts in the single corpus parse rather than expected counts from a distribution over parses. |
The Constrained Optimization Task | We begin by describing how for our typical model, the Viterbi EM objective can be formulated as a mixed integer quadratic programming (MIQP) problem with nonlinear constraints (Figure 2). |
The Constrained Optimization Task | Concretely, (1) specifies the Viterbi EM objective: the total log-probability of the best parse trees under the parameters 6, given by a sum of log-probabilities 6m of the individual steps needed to generate the tree, as encoded by the features fm. |
The Constrained Optimization Task | Figure 2: Viterbi EM as a mathematical program |
Parallel Segment Retrieval | to calculate the Viterbi alignment a. |
Parallel Segment Retrieval | The main bottleneck of the naive algorithm is finding new Viterbi Model 1 word alignments every time we change the spans. |
Parallel Segment Retrieval | an iterative approach to compute the Viterbi word alignments for IBM Model 1 using dynamic programming. |
Evaluation | The next line shows the fertility HMM with approximate posterior computation from Gibbs sampling but with final alignment selected by the Viterbi algorithm. |
Evaluation | The prior work compared Viterbi with a form of local search (sampling repeatedly and keeping the max), finding little difference between the two (Zhao and Gildea, 2010). |
Evaluation | Here, however, the difference between a dual decomposition and Viterbi is significant: their results were likely due to search error. |
Fertility distribution parameters | Algorithm AER (G—>E) AER (E—>G) HMM 24.0 21.8 FHMM Viterbi 19.7 19.6 FHMM Dual-dec 18.0 17.4 |
HMM alignment | argmaXa (f(a) + 2., u<k—1>(i,j>ya(i,j>) ing the standard Viterbi algorithm. |
HMM alignment | The algorithm described in Figure 2 finds this maximal assignment in 0(I J log J) time, generally faster than the CU2 J) time used by Viterbi . |
Introduction | Therefore, the standard forward-backward and Viterbi algorithms do not apply. |
Conclusion | In the E-step we use hillclimbing to get a likely alignment (ideally the Viterbi alignment). |
Introduction | Iteration 11 shows that merely 5.14% of the (HMM) probability mass are covered by the Viterbi alignment and its neighbors. |
Introduction | Having introduced the model variants, we proceed to derive a hillclimbing method to compute a likely alignment (ideally the Viterbi alignment) and its neighbors. |
Introduction | As an aside, the transfer iteration from HMM to IBM3 (iteration 11) reveals that only 5.14% of the probability mass3 are preserved when using the Viterbi alignment and its neighbors instead of all alignments. |
Training the New Variants | As in the training of the deficient IBM-3 and IBM-4 models, we approximate the expectations in the E-step by a set of likely alignments, ideally centered around the Viterbi alignment, but already for the regular deficient variants computing it is NP-hard (Udupa and Maji, 2006). |
Joint Alignment and NER Decoding | It is also worth noting that since we decode the alignment models with Viterbi inference, additional constraints such as the neighborhood constraint proposed by DeNero and Macherey (2011) can be easily integrated into our model. |
Related Work | Instead of enforcing agreement in the alignment space based on best sequences found by Viterbi , we could opt to encourage agreement between posterior probability distributions, which is related to the posterior regularization work by Graca et al. |
Related Work | We also differ in the implementation details, where in their case belief propagation is used in both training and Viterbi inference. |
The Sentence Tracker | 25:2 This can be determined with the Viterbi (1967) algorithm and is known as detection-based tracking ( Viterbi , 1971). |
The Sentence Tracker | max = q1,...,qT + which can also be found by the Viterbi algorithm. |
The Sentence Tracker | 3, which can also be determined using the Viterbi algorithm. |