Adding agreement constraints | The standard way to do this is to find the most probable alignment, using the Viterbi algorithm. |
Adding agreement constraints | There are two main differences between posterior decoding and Viterbi decoding. |
Adding agreement constraints | Viterbi , by contrast, must commit to one high-scoring alignment. |
Introduction | Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding. |
Phrase-based machine translation | Left: Viterbi |
Phrase-based machine translation | 26 '3 Baseline Viterbi ----- --E| ----- ~ I |
Phrase-based machine translation | Our initial experiments with Viterbi decoding and posterior decoding showed that for our agreement model posterior decoding could provide better alignment quality. |
Word alignment results | use Viterbi decoding, with larger improvements for small amounts of training data. |
Word alignment results | As described above an alternative to Viterbi decoding is to accept all alignments that have probability above some threshold. |
Word alignment results | Figure 7 shows an example of the same sentence, using the same model where in one case Viterbi decoding was used and in the other case Posterior decoding tuned to minimize AER on a development set |
Bitext Pruning Strategy | We can then find the best scoring sequence using the familiar Viterbi algorithm. |
Bootstrapping Phrasal ITG from Word-based ITG | This section introduces a technique that bootstraps candidate phrase pairs for phrase-based ITG from word-based ITG Viterbi alignments. |
Bootstrapping Phrasal ITG from Word-based ITG | We use ITG Viterbi alignments instead. |
Bootstrapping Phrasal ITG from Word-based ITG | Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair. |
Experiments | The charts were then pruned further by applying the non-compositional constraint from the Viterbi alignment links of that model. |
Experiments | Finally we ran 10 iterations of phrase-based ITG over the residual charts, using EM or VB, and extracted the Viterbi alignments. |
Summary of the Pipeline | After several training iterations, we obtain the Viterbi alignments on the training data according to the final model. |
Summary of the Pipeline | In the end, a Viterbi pass for the phrasal ITG is executed to produce the non-compositional phrasal alignments. |
Conclusion | We use an in-side/outside parsing algorithm to get the Viterbi outside cost of bigram integrated states which is used as an outside estimate for trigram integrated states. |
Decoding to Maximize BLEU | We approximate 6 and 04 using the Viterbi probabilities. |
Decoding to Maximize BLEU | Since decoding from bottom up in the trigram pass already gives us the inside Viterbi scores, we only have to visit the nodes in the reverse order once we reach the root to compute the Viterbi outside scores. |
Experiments | Simply by exploring more (200 times the log beam) after-goal items, we can optimize the Viterbi synchronous parse significantly, shown in Figure 3(left) in terms of model score versus search time. |
Multi-pass LM-Integrated Decoding | Since we want to approximate the Viterbi |
Multi-pass LM-Integrated Decoding | where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration. |
Related Work | Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system. |
Structured SVMs | This approach can be considered an HMM because the Viterbi algorithm is used to find the highest scoring tag sequence for a given observation sequence. |
Structured SVMs | The argmax in Equation 2 is conducted using the Viterbi algorithm. |
Forest Reranking | The exact decoding algorithm, shown in Pseudocode 2, is an instance of the bottom-up Viterbi algorithm, which traverses the hypergraph in a topological order, and at each node v, calculates its l-best derivation using each incoming hyperedge e E IN(v). |
Forest Reranking | function VITERBI (<V, E)) for v E V in topological order do for e E IN(v) do |
Supporting Forest Algorithms | Basically, we use an Inside-Outside algorithm to compute the Viterbi inside cost 6(2)) and the Viterbi outside cost 04(2)) for each node v, and then compute the merit 046(6) for each hyperedge: |