Experiments | Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams, with and without learning the parametric edit distances . |
Experiments | When we did not use learning, we set the parameters of the edit distance to (0, -3, -4) for matches, substitutions, and deletions/insertions, respectively. |
Experiments | Levenshtein refers to fixed parameter edit distance transducer. |
Learning | practice, it is important to estimate better parametric edit distances cpg and survival variables 85. |
Learning | However, a naively constructed edit distance , which for example might penalize vowel substitutions lightly, would fail to learn that Latin words that are borrowed into English would not undergo the sound change /I/—>/eI/. |
Learning | For the transducers go, we learn parameterized edit distances that model the probabilities of different sound changes. |
Model | In this paper, each (pg takes the form of a parameterized edit distance similar to the standard Levenshtein distance. |
Transducers and Automata | In our model, it is not just the edit distances that are finite state machines. |
A Phrase-Based Error Model | The cost of assigning k to ai is equal to the Levenshtein edit distance (Levenshtein, 1966) between the ith word in Q and the kth word in C, and the cost of assigning O to ai is equal to the length of the ith word in Q. |
Clickthrough Data and Spelling Correction | We then scored each query pair (Q1, Q2) using the edit distance between Q1 and Q2, and retained those with an edit distance score lower than a preset threshold as query correction pairs. |
Related Work | Most traditional systems use a manually tuned similarity function (e.g., edit distance function) to rank the candidates, as reviewed by Kukich (1992). |
The Baseline Speller System | Then we scan the query from left to right, and each query term q is looked up in lexicon to generate a list of spelling suggestions 0 whose edit distance from q is lower than a preset threshold. |
The Baseline Speller System | The lexicon is stored using a trie-based data structure that allows efficient search for all terms within a maximum edit distance . |
The Baseline Speller System | The error model (the first factor) is approximated by the edit distance function as |