Introduction | We describe k-best decoding for our hybrid model and design its loss function and the features appropriate for our task. |
Training method | Given a training example (xt, yt), MIRA tries to establish a margin between the score of the correct path 3(xt,yt;w) and the score of the best candidate path 3(xt, j}; w) based on the current weight vector w that is proportional to a loss function L(yt, 37). |
Training method | 4.3 Loss function |
Training method | We instead compute the loss function through false positives (FF) and false negatives (FN). |
Discussion | In this paper, we have described how MERT can be employed to estimate the weights for the linear loss function to maximize BLEU on a development set. |
Experiments | Note that N -best MBR uses a sentence BLEU loss function . |
Minimum Bayes-Risk Decoding | This reranking can be done for any sentence-level loss function such as BLEU (Papineni et al., 2001), Word Error Rate, or Position-independent Error Rate. |
Variational vs. Min-Risk Decoding | They use the following loss function , of which a linear approximation to BLEU (Papineni et al., 2001) is a special case, |
Variational vs. Min-Risk Decoding | With the above loss function , Tromble et al. |
Variational vs. Min-Risk Decoding | 15 The MBR becomes the MAP decision rule of (1) if a so-called zero-one loss function is used: l(y, y’) = 0 if y = y’ ; otherwise l(y,y’) = 1. |