Abstract | Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation. |
Background 2.1 Terminology | 2.3 Viterbi Approximation |
Background 2.1 Terminology | To approximate the intractable decoding problem of (2), most MT systems (Koehn et al., 2003; Chiang, 2007) use a simple Viterbi approximation, |
Background 2.1 Terminology | The Viterbi approximation is simple and tractable, but it ignores most derivations. |
Experimental Results | Viterbi 35.4 32.6 MBR (K=1000) 35.8 32.7 Crunching (N =10000) 35.7 32.8 |
Introduction | This corresponds to a Viterbi approximation that measures the goodness of an output string using only its most probable derivation, ignoring all the others. |
Variational Approximate Decoding | The Viterbi and crunching methods above approximate the intractable decoding of (2) by ignoring most of the derivations. |
Variational Approximate Decoding | Lastly, note that Viterbi and variational approximation are different ways to approximate the exact probability p(y | c), and each of them has pros and cons. |
Variational Approximate Decoding | Specifically, Viterbi approximation uses the correct probability of one complete |
Abstract | The minimum Bayes risk (MBR) decoding objective improves BLEU scores for machine translation output relative to the standard Viterbi objective of maximizing model score. |
Computing Feature Expectations | This forest-based MBR approach improved translation output relative to Viterbi translations. |
Consensus Decoding Algorithms | The standard Viterbi decoding objective is to find 6* = arg maxe A - 6( f, e). |
Consensus Decoding Algorithms | For MBR decoding, we instead leverage a similarity measure 8(6; 6’) to choose a translation using the model’s probability distribution P(e| f), which has support over a set of possible translations E. The Viterbi derivation 6* is the mode of this distribution. |
Experimental Results | Table 2: Translation performance improves when computing expected sentences from translation forests rather than 104-best lists, which in turn improve over Viterbi translations. |
Experimental Results | Figure 3: N - grams with high expected count are more likely to appear in the reference translation that 71- grams in the translation model’s Viterbi translation, 6*. |
Experimental Results | We endeavored to test the hypothesis that expected n-gram counts under the forest distribution carry more predictive information than the baseline Viterbi derivation 6* , which is the mode of the distribution. |
Recognition as a Generation Task | When a context expression (CE) with a parenthetical expression (PE) is met, the recognizer generates the Viterbi labeling for the CE, which leads to the PE or NULL. |
Recognition as a Generation Task | Then, if the Viterbi labeling leads to the PE, we can, at the same time, use the labeling to decide the full form within the CE. |
Results and Discussion | The CRF+GI produced a Viterbi labeling with a low probability, which is an incorrect abbreviation. |
Results and Discussion | To perform a systematic analysis of the superior-performance of DPLVM compare to CRF+GI, we collected the probability distributions (see Figure 7) of the Viterbi labelings from these models (“DPLVM vs. CRF+GI” is highlighted). |
Results and Discussion | A large percentage (37.9%) of the Viterbi labelings from the CRF+GI (ENG) have very small probability values (p < 0.1). |