Abstract | The Viterbi algorithm is the conventional decoding algorithm most widely adopted for sequence labeling. |
Abstract | Viterbi decoding is, however, prohibitively slow when the label set is large, because its time complexity is quadratic in the number of labels. |
Abstract | Experiments on three tasks (POS tagging, joint POS tagging and chunking, and supertagging) show that the new algorithm is several orders of magnitude faster than the basic Viterbi and a state-of-the-art algorithm, CARPEDIEM (Esposito and Radicioni, 2009). |
Introduction | This task, referred to as decoding, is usually carried out using the Viterbi algorithm ( Viterbi , 1967). |
Introduction | The Viterbi algorithm has 0(NL2) time complexity,1 where N is the input size and L is the number of labels. |
Introduction | Although the Viterbi algorithm is generally efficient, |
Conclusion | While models trained from Viterbi alignments already lead to good results, we have demonstrated |
Experimental Evaluation | We can see in Figure 3 that translation performance improves by moving from the Viterbi alignment to n-best alignments. |
Introduction | Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table |
Introduction | 1. the single-best Viterbi alignment, 2. the n-best alignments, |
Phrase Model Training | 4.1 Viterbi |
Phrase Model Training | The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment. |
Phrase Model Training | This can be applied to either the Viterbi phrase alignment or an n-best list. |
Related Work | For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations. |
Related Work | It is shown in (Ferrer and Juan, 2009), that Viterbi training produces almost the same results as full Baum-Welch training. |
Related Work | Our approach is similar to the one presented in (Ferrer and Juan, 2009) in that we compare Viterbi and a training method based on the Forward-Backward algorithm. |
Abstract | In this paper, we introduce an exact method for deciphering messages using a generalization of the Viterbi algorithm. |
Experiment | Algorithm 2 Generalized Viterbi Algorithm GVit(0, c, 7“) Input: partial solution 0, ciphertext character 0, and index 7“ into 0. |
Introduction | Our algorithm operates by running a search over the n- gram probabilities of possible solutions to the cipher, using a generalization of the Viterbi algorithm that is wrapped in an A* search, which determines at each step which partial solutions to expand. |
Results | Asymptotically, the time required for each sweep of the Viterbi algorithm increases, but this is more than offset by the decrease in the number of required sweeps. |
The Algorithm | Given a partial solution 0 of length n, we can extend 0 by choosing a ciphertext letter c not in the range of 0, and then use our generalization of the Viterbi algorithm to find, for each p not in the domain of 0, a score to rank the choice of p for 0, namely the trigram probability of the extension Up of 0. |
The Algorithm | Our generalization of the Viterbi algorithm, depicted in Figure 1, uses dynamic programming to score every immediate extension of a given partial solution in tandem, by finding, in a manner consistent with the real Viterbi algorithm, the most probable input string given a set of output symbols, which in this case is the cipher 0. |
The Algorithm | Unlike the real Viterbi algorithm, we must also observe the constraints of the input partial solution’s mapping. |
Introduction | The Viterbi algorithm (Rabiner, 1989) can then be used to produce the optimal sequence of latent states 3,- for a given instance X. |
Introduction | To learn an open-domain representation, we then trained an 80 state HMM on the unlabeled texts of the training and Brown test data, and used the Viterbi optimum states of each word as categorical features. |
Introduction | Span-HMMs can be used to provide a single categorical value for any span of a sentence using the usual Viterbi algorithm for HMMs. |
Conditional random fields | The standard Viterbi algorithm for making predictions from a trained CRF is not tuned to minimize false positives. |
Experimental results | For the English language, the CRF using the Viterbi path has overall error rate of 0.84%, compared to 6.81% for the TEX algorithm using American English patterns, which is eight times worse. |
Experimental results | For the English language, TALO yields overall error rate 1.99% with serious error rate 0.72%, so the standard CRF using the Viterbi path is better on both measures. |
Experimental results | For the Dutch language, the standard CRF using the Viterbi path has overall error rate 0.08%, compared to 0.81% for the TEX algorithm. |
Alignment by aggregation | In order to generate the list of best alignments, we use Algorithm 2, which is an adaptation of the standard Viterbi algorithm. |
EM Alignment | The final many-t0-many alignments are created by finding the most likely paths using the Viterbi algorithm based on the learned mapping probability table. |
Extrinsic evaluation | The decoder module uses standard Viterbi for the 1-1 case, and a phrasal decoder (Zens and Ney, 2004) for the MM case. |
A Phrase-Based Error Model | Figure 3: The dynamic programming algorithm for Viterbi bi-phrase segmentation. |
The Baseline Speller System | The first pass uses the Viterbi algorithm to find the best C according to the model of Equations (2) and (3). |
The Baseline Speller System | In the second pass, the A-Star algorithm is used to find the 20-best corrections, using the Viterbi scores computed at each state in the first pass as heuristics. |