Model Definition | The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time. |
Model Inference | In fact, we can make a stronger claim: we can reuse the Viterbi inference algorithm for linear chain graphical models that applies to the embedded directional HMM models. |
Model Inference | Note that Equation 5 and f are sums of terms in log space, while Viterbi inference for linear chains assumes a product of terms in probability space, which introduces the exponentiation above. |
Related Work | Our method is complementary to agreement-based learning, as it applies to Viterbi inference under the model rather than computing expectations. |
Related Work | First, we iterate over Viterbi predictions rather than posteriors. |
Related Work | Also, our model training is identical to the HMM-based baseline training, while they employ belief propagation for both training and Viterbi inference. |
Conclusion and Future Work | CYK 64.610 88.7 Berkeley CTF MaXRule 0.213 90.2 Berkeley CTF Viterbi 0.208 88.8 Beam + Boundary FOM (BB) 0.334 88.6 BB + Chart Constraints 0.244 88.7 |
Experimental Setup | Accuracy is computed from the l-best Viterbi (max) tree extracted from the chart. |
Experimental Setup | We compute the precision and recall of constituents from the l-best Viterbi trees using the standard EVALB script (? |
Results | Both our parser and the Berkeley parser are written in Java, both are run with Viterbi decoding, and both parse with the same grammar, so a direct comparison of speed and accuracy is fair.2 |
Experiments | Training a model using DD would require a different optimization algorithm based on Viterbi results (e. g. the perceptron) which we will pursue in future work. |
Oracle Parsing | We can also see from the scores of the Viterbi parses that while the reverse condition has access to much better parses, the model doesn’t actually find them. |
Oracle Parsing | Digging deeper, we compared parser model score against Viterbi F—score and oracle F-score at a va- |
Structured Learning | In Section 4 we show that finding summaries that optimize Objective 2, Viterbi prediction, is efficient. |
Structured Learning | Online learning algorithms like perceptron or the margin-infused relaxed algorithm (MIRA) (Crammer and Singer, 2003) are frequently used for structured problems where Viterbi inference is available. |
Structured Learning | Thus, we can easily perform loss-augmented prediction using the same procedure we use to perform Viterbi prediction (described in Section 4). |
Machine Translation as a Decipherment Task | Finally, we use the Viterbi algorithm to decode the foreign sentence f and produce an English translation 6 that maximizes P (e) - |
Word Substitution Decipherment | Finally, we decode the given ciphertext c by using the Viterbi algorithm to choose the plaintext decoding 6 that maximizes P(e) - Pgtmmed(c|e)3, stretching the channel probabilities (Knight et al., 2006). |
Word Substitution Decipherment | We then use the Viterbi algorithm to choose the English plaintext e that maximizes P(e) - Pgtmined(c|e)3. |