Index of papers in Proc. ACL that mention
  • Viterbi
Chang, Yin-Wen and Rush, Alexander M. and DeNero, John and Collins, Michael
Abstract
The relaxation can be solved with a modified version of the Viterbi algorithm.
Adding Back Constraints
In order to compute the arg max of this optimization problem, we need to satisfy the constraints within the Viterbi algorithm.
Adding Back Constraints
We augment the Viterbi chart with a count vector d E D where D C leplll and d(i,j) is a count for the (i,j)’th constraint, i.e.
Adding Back Constraints
Figure 6 shows this constrained Viterbi relaxation approach.
Adjacent Agreement
The key algorithmic idea is to extend the Viterbi algorithm in order to consider possible adjacent links in the reverse direction.
Adjacent Agreement
Despite the new constraints, we can still compute L()\) in 0(I J (I + J time using a variant of the Viterbi algorithm.
Adjacent Agreement
Figure 5: Modified Viterbi algorithm for computing the adj a-cent agreement L()\).
Background
Note that for both of these models we can solve the optimization problem exactly using the standard Viterbi algorithm for HMM decoding.
Bidirectional Alignment
We then use the standard Viterbi update for computing the at variables, adding in the score of the y(i, j) necessary to satisfy the constraints.
Introduction
Our decoder uses a simple variant of the Viterbi algorithm for solving a relaxed version of this model.
Introduction
0 It is based on a novel relaxation for the model of DeNero and Macherey (2011), solvable with a variant of the Viterbi algorithm.
Viterbi is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin
Abstract
In this paper we propose 1-best A*, 1-best iterative A*, k-best A* and k-best iterative Viterbi A* algorithms for sequential decoding.
Abstract
In particular, we show that iterative Viterbi A* decoding can be several times or orders of magnitude faster than the state-of-the-art algorithm for tagging tasks with a large number of labels.
Background
3.1 l-Best Viterbi
Background
The Viterbi algorithm is a classic dynamic programming based decoding algorithm.
Introduction
Traditionally, the Viterbi algorithm ( Viterbi , 1967) is used.
Introduction
Unfortunately, due to its 0(TL2) time complexity, where T is the input token size and L is the label size, the Viterbi decoding can become prohibitively slow when the label size is large (say, larger than 200).
Introduction
The Viterbi algorithm cannot find the best sequences in tolerable
Viterbi is mentioned in 72 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
For tagging, learned constraints are directly used to constrain Viterbi decoding.
Abstract
Dynamic programming techniques based on Markov assumptions, such as Viterbi decoding, cannot handle those ’nonlocal’ constraints as discussed above.
Abstract
However, it is possible to constrain Viterbi
Viterbi is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Eisner, Jason
A Branch-and-Bound Algorithm
The mixed integer quadratic program with nonlinear constraints, given in the previous section, maximizes the nonconvex Viterbi EM objective and is NP-hard to solve (Cohen and Smith, 2010).
A Branch-and-Bound Algorithm
The standard approach to optimizing this program is local search by the hard ( Viterbi ) EM algorithm.
Abstract
Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
Introduction
Many parameter estimation techniques have been attempted, including expectation-maximization (EM) (Klein and Manning, 2004; Spitkovsky et al., 2010a), contrastive estimation (Smith and Eisner, 2006; Smith, 2006), Viterbi EM (Spitkovsky et al., 2010b), and variational EM (Naseem et al., 2010; Cohen et al., 2009; Cohen and Smith, 2009).
Introduction
Note that this objective is that of hard ( Viterbi ) EM—we do not marginalize over the parses as in classical EM.1
Introduction
Our results demonstrate that our method can obtain higher likelihoods than Viterbi EM with random restarts.
Projections
These correspond to the Viterbi E step and M step, respectively.
Relaxations
Since we optimize the Viterbi EM objective, we directly constrain the counts in the single corpus parse rather than expected counts from a distribution over parses.
The Constrained Optimization Task
We begin by describing how for our typical model, the Viterbi EM objective can be formulated as a mixed integer quadratic programming (MIQP) problem with nonlinear constraints (Figure 2).
The Constrained Optimization Task
Concretely, (1) specifies the Viterbi EM objective: the total log-probability of the best parse trees under the parameters 6, given by a sum of log-probabilities 6m of the individual steps needed to generate the tree, as encoded by the features fm.
The Constrained Optimization Task
Figure 2: Viterbi EM as a mathematical program
Viterbi is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Conclusion
While models trained from Viterbi alignments already lead to good results, we have demonstrated
Experimental Evaluation
We can see in Figure 3 that translation performance improves by moving from the Viterbi alignment to n-best alignments.
Introduction
Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
Introduction
1. the single-best Viterbi alignment, 2. the n-best alignments,
Phrase Model Training
4.1 Viterbi
Phrase Model Training
The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment.
Phrase Model Training
This can be applied to either the Viterbi phrase alignment or an n-best list.
Related Work
For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations.
Related Work
It is shown in (Ferrer and Juan, 2009), that Viterbi training produces almost the same results as full Baum-Welch training.
Related Work
Our approach is similar to the one presented in (Ferrer and Juan, 2009) in that we compare Viterbi and a training method based on the Forward-Backward algorithm.
Viterbi is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru
Abstract
The Viterbi algorithm is the conventional decoding algorithm most widely adopted for sequence labeling.
Abstract
Viterbi decoding is, however, prohibitively slow when the label set is large, because its time complexity is quadratic in the number of labels.
Abstract
Experiments on three tasks (POS tagging, joint POS tagging and chunking, and supertagging) show that the new algorithm is several orders of magnitude faster than the basic Viterbi and a state-of-the-art algorithm, CARPEDIEM (Esposito and Radicioni, 2009).
Introduction
This task, referred to as decoding, is usually carried out using the Viterbi algorithm ( Viterbi , 1967).
Introduction
The Viterbi algorithm has 0(NL2) time complexity,1 where N is the input size and L is the number of labels.
Introduction
Although the Viterbi algorithm is generally efficient,
Viterbi is mentioned in 42 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev
Abstract
Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation.
Background 2.1 Terminology
2.3 Viterbi Approximation
Background 2.1 Terminology
To approximate the intractable decoding problem of (2), most MT systems (Koehn et al., 2003; Chiang, 2007) use a simple Viterbi approximation,
Background 2.1 Terminology
The Viterbi approximation is simple and tractable, but it ignores most derivations.
Experimental Results
Viterbi 35.4 32.6 MBR (K=1000) 35.8 32.7 Crunching (N =10000) 35.7 32.8
Introduction
This corresponds to a Viterbi approximation that measures the goodness of an output string using only its most probable derivation, ignoring all the others.
Variational Approximate Decoding
The Viterbi and crunching methods above approximate the intractable decoding of (2) by ignoring most of the derivations.
Variational Approximate Decoding
Lastly, note that Viterbi and variational approximation are different ways to approximate the exact probability p(y | c), and each of them has pros and cons.
Variational Approximate Decoding
Specifically, Viterbi approximation uses the correct probability of one complete
Viterbi is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Berg-Kirkpatrick, Taylor and Klein, Dan
Abstract
(2013) presented an approach to GPU parsing that sacrifices traditional sparsity in exchange for raw computational power, obtaining a system that can compute Viterbi parses for a high-quality grammar at about 164 sentences per second on a midrange GPU.
Abstract
The resulting system is capable of computing over 404 Viterbi parses per second—more than a 2x speedup—on the same hardware.
Anatomy of a Dense GPU Parser
Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank.
Anatomy of a Dense GPU Parser
The Viterbi inside loop for the grammar is inlined into a kernel.
Anatomy of a Dense GPU Parser
(2013)’s system is able to compute Viterbi charts at 164 sentences per second, for sentences up to length 40.
Grammar Clustering
The unpruned Viterbi computations in a fine grammar using the clustering method of Canny et al.
Introduction
Together these kernels implement the Viterbi inside algorithm.
Introduction
On a midrange GPU, their system can compute Viterbi derivations at 164 sentences per second on sentences of length 40 or less (see timing details below).
Introduction
(2013) is that it only computes Viterbi parses.
Minimum Bayes risk parsing
The Viterbi algorithm is a reasonably effective method for parsing.
Viterbi is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Adding agreement constraints
The standard way to do this is to find the most probable alignment, using the Viterbi algorithm.
Adding agreement constraints
There are two main differences between posterior decoding and Viterbi decoding.
Adding agreement constraints
Viterbi , by contrast, must commit to one high-scoring alignment.
Introduction
Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding.
Phrase-based machine translation
Left: Viterbi
Phrase-based machine translation
26 '3 Baseline Viterbi ----- --E| ----- ~ I
Phrase-based machine translation
Our initial experiments with Viterbi decoding and posterior decoding showed that for our agreement model posterior decoding could provide better alignment quality.
Word alignment results
use Viterbi decoding, with larger improvements for small amounts of training data.
Word alignment results
As described above an alternative to Viterbi decoding is to accept all alignments that have probability above some threshold.
Word alignment results
Figure 7 shows an example of the same sentence, using the same model where in one case Viterbi decoding was used and in the other case Posterior decoding tuned to minimize AER on a development set
Viterbi is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Abstract
The minimum Bayes risk (MBR) decoding objective improves BLEU scores for machine translation output relative to the standard Viterbi objective of maximizing model score.
Computing Feature Expectations
This forest-based MBR approach improved translation output relative to Viterbi translations.
Consensus Decoding Algorithms
The standard Viterbi decoding objective is to find 6* = arg maxe A - 6( f, e).
Consensus Decoding Algorithms
For MBR decoding, we instead leverage a similarity measure 8(6; 6’) to choose a translation using the model’s probability distribution P(e| f), which has support over a set of possible translations E. The Viterbi derivation 6* is the mode of this distribution.
Experimental Results
Table 2: Translation performance improves when computing expected sentences from translation forests rather than 104-best lists, which in turn improve over Viterbi translations.
Experimental Results
Figure 3: N - grams with high expected count are more likely to appear in the reference translation that 71- grams in the translation model’s Viterbi translation, 6*.
Experimental Results
We endeavored to test the hypothesis that expected n-gram counts under the forest distribution carry more predictive information than the baseline Viterbi derivation 6* , which is the mode of the distribution.
Viterbi is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi
Recognition as a Generation Task
When a context expression (CE) with a parenthetical expression (PE) is met, the recognizer generates the Viterbi labeling for the CE, which leads to the PE or NULL.
Recognition as a Generation Task
Then, if the Viterbi labeling leads to the PE, we can, at the same time, use the labeling to decide the full form within the CE.
Results and Discussion
The CRF+GI produced a Viterbi labeling with a low probability, which is an incorrect abbreviation.
Results and Discussion
To perform a systematic analysis of the superior-performance of DPLVM compare to CRF+GI, we collected the probability distributions (see Figure 7) of the Viterbi labelings from these models (“DPLVM vs. CRF+GI” is highlighted).
Results and Discussion
A large percentage (37.9%) of the Viterbi labelings from the CRF+GI (ENG) have very small probability values (p < 0.1).
Viterbi is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Corlett, Eric and Penn, Gerald
Abstract
In this paper, we introduce an exact method for deciphering messages using a generalization of the Viterbi algorithm.
Experiment
Algorithm 2 Generalized Viterbi Algorithm GVit(0, c, 7“) Input: partial solution 0, ciphertext character 0, and index 7“ into 0.
Introduction
Our algorithm operates by running a search over the n- gram probabilities of possible solutions to the cipher, using a generalization of the Viterbi algorithm that is wrapped in an A* search, which determines at each step which partial solutions to expand.
Results
Asymptotically, the time required for each sweep of the Viterbi algorithm increases, but this is more than offset by the decrease in the number of required sweeps.
The Algorithm
Given a partial solution 0 of length n, we can extend 0 by choosing a ciphertext letter c not in the range of 0, and then use our generalization of the Viterbi algorithm to find, for each p not in the domain of 0, a score to rank the choice of p for 0, namely the trigram probability of the extension Up of 0.
The Algorithm
Our generalization of the Viterbi algorithm, depicted in Figure 1, uses dynamic programming to score every immediate extension of a given partial solution in tandem, by finding, in a manner consistent with the real Viterbi algorithm, the most probable input string given a set of output symbols, which in this case is the cipher 0.
The Algorithm
Unlike the real Viterbi algorithm, we must also observe the constraints of the input partial solution’s mapping.
Viterbi is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Parallel Segment Retrieval
to calculate the Viterbi alignment a.
Parallel Segment Retrieval
The main bottleneck of the naive algorithm is finding new Viterbi Model 1 word alignments every time we change the spans.
Parallel Segment Retrieval
an iterative approach to compute the Viterbi word alignments for IBM Model 1 using dynamic programming.
Viterbi is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel
Bitext Pruning Strategy
We can then find the best scoring sequence using the familiar Viterbi algorithm.
Bootstrapping Phrasal ITG from Word-based ITG
This section introduces a technique that bootstraps candidate phrase pairs for phrase-based ITG from word-based ITG Viterbi alignments.
Bootstrapping Phrasal ITG from Word-based ITG
We use ITG Viterbi alignments instead.
Bootstrapping Phrasal ITG from Word-based ITG
Figure 3 (a) shows all possible non-compositional phrases given the Viterbi word alignment of the example sentence pair.
Experiments
The charts were then pruned further by applying the non-compositional constraint from the Viterbi alignment links of that model.
Experiments
Finally we ran 10 iterations of phrase-based ITG over the residual charts, using EM or VB, and extracted the Viterbi alignments.
Summary of the Pipeline
After several training iterations, we obtain the Viterbi alignments on the training data according to the final model.
Summary of the Pipeline
In the end, a Viterbi pass for the phrasal ITG is executed to produce the non-compositional phrasal alignments.
Viterbi is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Schoenemann, Thomas
Conclusion
In the E-step we use hillclimbing to get a likely alignment (ideally the Viterbi alignment).
Introduction
Iteration 11 shows that merely 5.14% of the (HMM) probability mass are covered by the Viterbi alignment and its neighbors.
Introduction
Having introduced the model variants, we proceed to derive a hillclimbing method to compute a likely alignment (ideally the Viterbi alignment) and its neighbors.
Introduction
As an aside, the transfer iteration from HMM to IBM3 (iteration 11) reveals that only 5.14% of the probability mass3 are preserved when using the Viterbi alignment and its neighbors instead of all alignments.
Training the New Variants
As in the training of the deficient IBM-3 and IBM-4 models, we approximate the expectations in the E-step by a set of likely alignments, ideally centered around the Viterbi alignment, but already for the regular deficient variants computing it is NP-hard (Udupa and Maji, 2006).
Viterbi is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Quirk, Chris
Evaluation
The next line shows the fertility HMM with approximate posterior computation from Gibbs sampling but with final alignment selected by the Viterbi algorithm.
Evaluation
The prior work compared Viterbi with a form of local search (sampling repeatedly and keeping the max), finding little difference between the two (Zhao and Gildea, 2010).
Evaluation
Here, however, the difference between a dual decomposition and Viterbi is significant: their results were likely due to search error.
Fertility distribution parameters
Algorithm AER (G—>E) AER (E—>G) HMM 24.0 21.8 FHMM Viterbi 19.7 19.6 FHMM Dual-dec 18.0 17.4
HMM alignment
argmaXa (f(a) + 2., u<k—1>(i,j>ya(i,j>) ing the standard Viterbi algorithm.
HMM alignment
The algorithm described in Figure 2 finds this maximal assignment in 0(I J log J) time, generally faster than the CU2 J) time used by Viterbi .
Introduction
Therefore, the standard forward-backward and Viterbi algorithms do not apply.
Viterbi is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Introduction
in linear time (in the number of tokens) following the generalized Viterbi algorithm.
Introduction
A slightly modified version of Viterbi could also be used to find segmentations that are constrained to agree with some given motif boundaries, but can segment other parts of the sentence optimally under these constraints.
Introduction
Here y’ = Decode(x(k),w) is the optimal Viterbi decoding using the current estimates of the weights.
Viterbi is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro
RNN-based Alignment Model
The Viterbi alignment is determined using the Viterbi algorithm, similar to the FFNN-based model, where the model is sequentially applied from f1 to f J5.
RNN-based Alignment Model
5 Strictly speaking, we cannot apply the dynamic programming forward-backward algorithm (i.e., the Viterbi algorithm) due to the long alignment history of yi.
RNN-based Alignment Model
Thus, the Viterbi alignment is computed approximately using heuristic beam search.
Related Work
Given a specific model, the best alignment ( Viterbi alignment) of the sentence pair (ff, (if) can be found as
Related Work
a1 For example, the HMM model identifies the Viterbi alignment using the Viterbi algorithm.
Related Work
The model finds the Viterbi alignment using the Viterbi algorithm, similar to the classic HMM model.
Viterbi is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Conclusion
We use an in-side/outside parsing algorithm to get the Viterbi outside cost of bigram integrated states which is used as an outside estimate for trigram integrated states.
Decoding to Maximize BLEU
We approximate 6 and 04 using the Viterbi probabilities.
Decoding to Maximize BLEU
Since decoding from bottom up in the trigram pass already gives us the inside Viterbi scores, we only have to visit the nodes in the reverse order once we reach the root to compute the Viterbi outside scores.
Experiments
Simply by exploring more (200 times the log beam) after-goal items, we can optimize the Viterbi synchronous parse significantly, shown in Figure 3(left) in terms of model score versus search time.
Multi-pass LM-Integrated Decoding
Since we want to approximate the Viterbi
Multi-pass LM-Integrated Decoding
where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration.
Viterbi is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Macherey, Klaus
Model Definition
The highest probability word alignment vector under the model for a given sentence pair (6, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 - time.
Model Inference
In fact, we can make a stronger claim: we can reuse the Viterbi inference algorithm for linear chain graphical models that applies to the embedded directional HMM models.
Model Inference
Note that Equation 5 and f are sums of terms in log space, while Viterbi inference for linear chains assumes a product of terms in probability space, which introduces the exponentiation above.
Related Work
Our method is complementary to agreement-based learning, as it applies to Viterbi inference under the model rather than computing expectations.
Related Work
First, we iterate over Viterbi predictions rather than posteriors.
Related Work
Also, our model training is identical to the HMM-based baseline training, while they employ belief propagation for both training and Viterbi inference.
Viterbi is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
Unsupervised Grammar Induction Our first method for grammar induction is fully unsupervised Viterbi EM training of the Dependency Model with Valence (DMV) (Klein and Manning, 2004), with uniform initialization of the model parameters.
Approaches
Constrained Grammar Induction Our second method, which we will refer to as DMV+C, induces grammar in a distantly supervised fashion by using a constrained parser in the E-step of Viterbi EM.
Experiments
We contrast the DMV trained with Viterbi EM+uniform initialization (DMV), our constrained DMV (DMV+C), and our model’s MBR decoding of latent syntax (Marginalized) with other recent work: Spitkovsky et al.
Related Work
(2010a) show that Viterbi (hard) EM training of the DMV with simple uniform initialization of the model parameters yields higher accuracy models than standard soft-EM
Related Work
In Viterbi EM, the E—step finds the maximum likelihood corpus parse given the current model parameters.
Viterbi is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Vanessa Wei and Hirst, Graeme
Abstract
To enhance the accuracy of the pipeline, we add additional constraints in the Viterbi decoding of the first CRF.
Bottom-up tree-building
Therefore, to improve its accuracy, we enforce additional commonsense constraints in its Viterbi decoding.
Bottom-up tree-building
Similar to Mfgfi‘cf’, for , we also include additional constraints in the Viterbi decoding, by disallowing transitions between two ones, and disallowing the sequence to be all zeros if it contains all the remaining constituents in the document.
Conclusions
Our approach was to adopt a greedy bottom-up tree-building, with two linear-chain CRFs as local probabilistic models, and enforce reasonable constraints in the first CRF’s Viterbi decoding.
Introduction
Specifically, in the Viterbi decoding of the first CRF, we include additional constraints elicited from common sense, to make more effective local decisions.
Viterbi is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Bodenstab, Nathan and Dunlop, Aaron and Hall, Keith and Roark, Brian
Conclusion and Future Work
CYK 64.610 88.7 Berkeley CTF MaXRule 0.213 90.2 Berkeley CTF Viterbi 0.208 88.8 Beam + Boundary FOM (BB) 0.334 88.6 BB + Chart Constraints 0.244 88.7
Experimental Setup
Accuracy is computed from the l-best Viterbi (max) tree extracted from the chart.
Experimental Setup
We compute the precision and recall of constituents from the l-best Viterbi trees using the standard EVALB script (?
Results
Both our parser and the Berkeley parser are written in Java, both are run with Viterbi decoding, and both parse with the same grammar, so a direct comparison of speed and accuracy is fair.2
Viterbi is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Trogkanis, Nikolaos and Elkan, Charles
Conditional random fields
The standard Viterbi algorithm for making predictions from a trained CRF is not tuned to minimize false positives.
Experimental results
For the English language, the CRF using the Viterbi path has overall error rate of 0.84%, compared to 6.81% for the TEX algorithm using American English patterns, which is eight times worse.
Experimental results
For the English language, TALO yields overall error rate 1.99% with serious error rate 0.72%, so the standard CRF using the Viterbi path is better on both measures.
Experimental results
For the Dutch language, the standard CRF using the Viterbi path has overall error rate 0.08%, compared to 0.81% for the TEX algorithm.
Viterbi is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Introduction
The Viterbi algorithm (Rabiner, 1989) can then be used to produce the optimal sequence of latent states 3,- for a given instance X.
Introduction
To learn an open-domain representation, we then trained an 80 state HMM on the unlabeled texts of the training and Brown test data, and used the Viterbi optimum states of each word as categorical features.
Introduction
Span-HMMs can be used to provide a single categorical value for any span of a sentence using the usual Viterbi algorithm for HMMs.
Viterbi is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Joint Alignment and NER Decoding
It is also worth noting that since we decode the alignment models with Viterbi inference, additional constraints such as the neighborhood constraint proposed by DeNero and Macherey (2011) can be easily integrated into our model.
Related Work
Instead of enforcing agreement in the alignment space based on best sequences found by Viterbi , we could opt to encourage agreement between posterior probability distributions, which is related to the posterior regularization work by Graca et al.
Related Work
We also differ in the implementation details, where in their case belief propagation is used in both training and Viterbi inference.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin
Related Work
Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system.
Structured SVMs
This approach can be considered an HMM because the Viterbi algorithm is used to find the highest scoring tag sequence for a given observation sequence.
Structured SVMs
The argmax in Equation 2 is conducted using the Viterbi algorithm.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Forest Reranking
The exact decoding algorithm, shown in Pseudocode 2, is an instance of the bottom-up Viterbi algorithm, which traverses the hypergraph in a topological order, and at each node v, calculates its l-best derivation using each incoming hyperedge e E IN(v).
Forest Reranking
function VITERBI (<V, E)) for v E V in topological order do for e E IN(v) do
Supporting Forest Algorithms
Basically, we use an Inside-Outside algorithm to compute the Viterbi inside cost 6(2)) and the Viterbi outside cost 04(2)) for each node v, and then compute the merit 046(6) for each hyperedge:
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shi, Xing and Knight, Kevin and Ji, Heng
Training
To do this, we first take our phrasebook triples and construct sample string pairs <Epron, Pinyin-split> by pronouncing the phrasebook English with FST A, and by pronouncing the phrasebook Chinglish with FSTs D and C. Then we run the EM algorithm to learn FST B parameters (Table 3) and Viterbi alignments, such as:
Training
First, we obtain Viterbi alignments using the phoneme-based model, e. g.:
Training
EM learns values for parameters like P(nai te|night), plus Viterbi alignments such as:
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Pinyin Input Method Model
The Viterbi algorithm ( Viterbi , 1967) is used for the decoding.
Pinyin Input Method Model
The shortest path algorithm for typo correction and Viterbi algorithm for PTC conversion are very closely related.
Related Works
The idea of “statistical input method” was proposed by modeling PTC conversion as a hidden Markov model (HMM), and using Viterbi ( Viterbi , 1967) algorithm to decode the sequence.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yu, Haonan and Siskind, Jeffrey Mark
The Sentence Tracker
25:2 This can be determined with the Viterbi (1967) algorithm and is known as detection-based tracking ( Viterbi , 1971).
The Sentence Tracker
max = q1,...,qT + which can also be found by the Viterbi algorithm.
The Sentence Tracker
3, which can also be determined using the Viterbi algorithm.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiampojamarn, Sittichai and Kondrak, Grzegorz
Alignment by aggregation
In order to generate the list of best alignments, we use Algorithm 2, which is an adaptation of the standard Viterbi algorithm.
EM Alignment
The final many-t0-many alignments are created by finding the most likely paths using the Viterbi algorithm based on the learned mapping probability table.
Extrinsic evaluation
The decoder module uses standard Viterbi for the 1-1 case, and a phrasal decoder (Zens and Ney, 2004) for the MM case.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
A Phrase-Based Error Model
Figure 3: The dynamic programming algorithm for Viterbi bi-phrase segmentation.
The Baseline Speller System
The first pass uses the Viterbi algorithm to find the best C according to the model of Equations (2) and (3).
The Baseline Speller System
In the second pass, the A-Star algorithm is used to find the 20-best corrections, using the Viterbi scores computed at each state in the first pass as heuristics.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Lopez, Adam
Experiments
Training a model using DD would require a different optimization algorithm based on Viterbi results (e. g. the perceptron) which we will pursue in future work.
Oracle Parsing
We can also see from the scores of the Viterbi parses that while the reverse condition has access to much better parses, the model doesn’t actually find them.
Oracle Parsing
Digging deeper, we compared parser model score against Viterbi F—score and oracle F-score at a va-
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Gillick, Dan and Klein, Dan
Structured Learning
In Section 4 we show that finding summaries that optimize Objective 2, Viterbi prediction, is efficient.
Structured Learning
Online learning algorithms like perceptron or the margin-infused relaxed algorithm (MIRA) (Crammer and Singer, 2003) are frequently used for structured problems where Viterbi inference is available.
Structured Learning
Thus, we can easily perform loss-augmented prediction using the same procedure we use to perform Viterbi prediction (described in Section 4).
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and DeNero, John
A Class-based Model of Agreement
It can be learned from gold-segmented data, generally applies to languages with bound morphemes, and does not require a hand-compiled lexicon.3 Moreover, it has only four labels, so Viterbi decoding is very fast.
Inference during Translation Decoding
Incremental Greedy Decoding Decoding with the CRF—based tagger model in this setting requires some slight modifications to the Viterbi algorithm.
Inference during Translation Decoding
This forces the Viterbi path to go through If.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Machine Translation as a Decipherment Task
Finally, we use the Viterbi algorithm to decode the foreign sentence f and produce an English translation 6 that maximizes P (e) -
Word Substitution Decipherment
Finally, we decode the given ciphertext c by using the Viterbi algorithm to choose the plaintext decoding 6 that maximizes P(e) - Pgtmmed(c|e)3, stretching the channel probabilities (Knight et al., 2006).
Word Substitution Decipherment
We then use the Viterbi algorithm to choose the English plaintext e that maximizes P(e) - Pgtmined(c|e)3.
Viterbi is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: