Learning Phrase-Based Spelling Error Models from Clickthrough Data
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris

Article Structure

Abstract

This paper explores the use of clickthrough data for query spelling correction.

Introduction

Search queries present a particular challenge for traditional spelling correction methods for three main reasons (Ahmad and Kondrak, 2004).

Related Work

Spelling correction for regular written text is a long standing research topic.

Clickthrough Data and Spelling Correction

This section describes the way the query-correction pairs are extracted from click-

The Baseline Speller System

The spelling correction problem is typically formulated under the framework of the source channel model.

A Phrase-Based Error Model

The goal of the phrase-based error model is to transform a correctly spelled query C into a misspelled query Q.

Experiments

We evaluate the spelling error models on a large scale real world data set containing 24,172 queries sampled from one year’s worth of query logs from a commercial search engine.

Topics

phrase-based

Appears in 15 sentences as: Phrase-Based (1) phrase-based (15)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system.
    Page 1, “Abstract”
  2. Results show that the system using the phrase-based error model outperforms significantly its baseline systems.
    Page 1, “Abstract”
  3. Among these models, the most effective one is a phrase-based error model that captures the probability of transforming one multi-term phrase into another multi-term phrase.
    Page 1, “Introduction”
  4. Comparing to traditional error models that account for transformation probabilities between single characters (Kernighan et al., 1990) or sub-word strings (Brill and Moore, 2000), the phrase-based model is more powerful in that it captures some contextual information by retaining inter-term dependencies.
    Page 1, “Introduction”
  5. In particular, the speller system incorporating a phrase-based error model significantly outperforms its baseline systems.
    Page 2, “Introduction”
  6. Section 5 describes in detail the phrase-based error model.
    Page 2, “Introduction”
  7. To this end, inspired by the phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003; Och and Ney, 2004), we propose a phrase-based error model where we assume that query spelling correction is performed at the phrase level.
    Page 2, “Related Work”
  8. In what follows, before presenting the phrase-based error model, we will first describe the clickthrough data and the query speller system we used in this study.
    Page 2, “Related Work”
  9. Figure 2: Example demonstrating the generative procedure behind the phrase-based error model.
    Page 5, “The Baseline Speller System”
  10. The goal of the phrase-based error model is to transform a correctly spelled query C into a misspelled query Q.
    Page 5, “A Phrase-Based Error Model”
  11. If we assume a uniform probability over segmentations, then the phrase-based probability can be defined as:
    Page 5, “A Phrase-Based Error Model”

See all papers in Proc. ACL 2010 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

word alignment

Appears in 11 sentences as: (1) word aligned (1) word alignment (10) word alignments (2)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Let J be the length of Q, L be the length of C, and A = a1, ..., a] be a hidden variable representing the word alignment .
    Page 5, “A Phrase-Based Error Model”
  2. When scoring a given candidate pair, we further restrict our attention to those S, T, M triples that are consistent with the word alignment , which we denote as B(C, Q, A*).
    Page 5, “A Phrase-Based Error Model”
  3. Once the word alignment is fixed, the final permutation is uniquely determined, so we can safely discard that factor.
    Page 5, “A Phrase-Based Error Model”
  4. Here, though, both the input and the output word sequences are specified as the input to the algorithm, as is the word alignment .
    Page 6, “A Phrase-Based Error Model”
  5. We define the quantity aj to be the probability of the most likely sequence of bi-phrases that produce the first j terms of Q and are consistent with the word alignment and C. It can be calculated using the following algorithm:
    Page 6, “A Phrase-Based Error Model”
  6. Figure 4: Toy example of (left) a word alignment between two strings "adcf' and "ABCDEF"; and (right) the bi-phrases containing up to four words that are consistent with the word alignment.
    Page 6, “A Phrase-Based Error Model”
  7. The algorithm takes a complexity of 0(KL2), where K is the total number of word alignments in A* which does not contain empty words, and L is the maximum length of a bi-phrase, which is a hyper-parameter of the algorithm.
    Page 6, “A Phrase-Based Error Model”
  8. From each query-correction pair with its word alignment (Q, C, A*), all bi-phrases consistent with the word alignment are identified.
    Page 6, “A Phrase-Based Error Model”
  9. Second, there must not be any word alignments from words inside the bi-phrase to words outside the bi-phrase.
    Page 6, “A Phrase-Based Error Model”
  10. Assume we have a word translation distribution t(q|c) (defined over individual words, not phrases), and a word alignment A between q and c; here, the word alignment contains (1', j) pairs, where i E 1.. |q| and] E 0.. Icl, with 0 indicating an inserted word.
    Page 7, “A Phrase-Based Error Model”
  11. We estimate the word translation probabilities using counts from the word aligned corpus: t(q|c) = N (M) Eq'N(c.q') the words (not phrases as in Equation 11) c and q are aligned in the training data.
    Page 7, “A Phrase-Based Error Model”

See all papers in Proc. ACL 2010 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

edit distance

Appears in 7 sentences as: edit distance (8)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Most traditional systems use a manually tuned similarity function (e.g., edit distance function) to rank the candidates, as reviewed by Kukich (1992).
    Page 2, “Related Work”
  2. We then scored each query pair (Q1, Q2) using the edit distance between Q1 and Q2, and retained those with an edit distance score lower than a preset threshold as query correction pairs.
    Page 3, “Clickthrough Data and Spelling Correction”
  3. Then we scan the query from left to right, and each query term q is looked up in lexicon to generate a list of spelling suggestions 0 whose edit distance from q is lower than a preset threshold.
    Page 4, “The Baseline Speller System”
  4. The lexicon is stored using a trie-based data structure that allows efficient search for all terms within a maximum edit distance .
    Page 4, “The Baseline Speller System”
  5. The error model (the first factor) is approximated by the edit distance function as
    Page 4, “The Baseline Speller System”
  6. Since we define the logarithm of the probabilities of the language model and the error model (i.e., the edit distance function) as features, the ranker can be viewed as a more general framework, subsuming the source channel model as a special case.
    Page 5, “The Baseline Speller System”
  7. The cost of assigning k to ai is equal to the Levenshtein edit distance (Levenshtein, 1966) between the ith word in Q and the kth word in C, and the cost of assigning O to ai is equal to the length of the ith word in Q.
    Page 5, “A Phrase-Based Error Model”

See all papers in Proc. ACL 2010 that mention edit distance.

See all papers in Proc. ACL that mention edit distance.

Back to top.

contextual information

Appears in 6 sentences as: contextual information (7)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Comparing to traditional error models that account for transformation probabilities between single characters (Kernighan et al., 1990) or sub-word strings (Brill and Moore, 2000), the phrase-based model is more powerful in that it captures some contextual information by retaining inter-term dependencies.
    Page 1, “Introduction”
  2. We show that this information is crucial to detect the correction of a query term, because unlike in regular written text, any query word can be a valid search term and in many cases the only way for a speller system to make the judgment is to explore its usage according to the contextual information .
    Page 1, “Introduction”
  3. Typically, a language model (source model) is used to capture contextual information, while an error model (channel model) is considered to be context free in that it does not take into account any contextual information in modeling word transformation probabilities.
    Page 2, “Related Work”
  4. In this study we argue that it is beneficial to capture contextual information in the error model.
    Page 2, “Related Work”
  5. Rather than replacing single words in isolation, this model replaces sequences of words with sequences of words, thus incorporating contextual information .
    Page 5, “A Phrase-Based Error Model”
  6. Notice that when we set L=1, the phrase-based error model is reduced to a word-based error model which assumes that words are transformed independently from C to Q, without taking into account any contextual information .
    Page 6, “A Phrase-Based Error Model”

See all papers in Proc. ACL 2010 that mention contextual information.

See all papers in Proc. ACL that mention contextual information.

Back to top.

language model

Appears in 6 sentences as: language model (6)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. First, a pre-def1ned confusion set is used to generate candidate corrections, then a scoring model, such as a trigram language model or na'1've Bayes classifier, is used to rank the candidates according to their context (e.g., Golding and Roth, 1996; Mangu and Brill, 1997; Church et al., 2007).
    Page 2, “Related Work”
  2. (2009) present a query speller system in which both the error model and the language model are trained using Web data.
    Page 2, “Related Work”
  3. Typically, a language model (source model) is used to capture contextual information, while an error model (channel model) is considered to be context free in that it does not take into account any contextual information in modeling word transformation probabilities.
    Page 2, “Related Work”
  4. where the error model P(QIC) models the transformation probability from C to Q, and the language model P(C) models how likely C is a correctly spelled query.
    Page 4, “The Baseline Speller System”
  5. The language model (the second factor) is a backoff bigram model trained on the tokenized form of one year of query logs, using maximum likelihood estimation with absolute discounting smoothing.
    Page 4, “The Baseline Speller System”
  6. Since we define the logarithm of the probabilities of the language model and the error model (i.e., the edit distance function) as features, the ranker can be viewed as a more general framework, subsuming the source channel model as a special case.
    Page 5, “The Baseline Speller System”

See all papers in Proc. ACL 2010 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

model training

Appears in 4 sentences as: model trained (1) model training (2) models trained (1)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Unfortunately, we found in our experiments that the pairs extracted using the method are too noisy for reliable error model training , even with a very tight threshold, and we did not see any significant improvement.
    Page 3, “Clickthrough Data and Spelling Correction”
  2. Although by doing so we could miss some basic, obvious spelling corrections, our experiments show that the negative impact on error model training is negligible.
    Page 4, “Clickthrough Data and Spelling Correction”
  3. We found that the error models trained using the data directly extracted from the query reformulation sessions suffer from the problem of underestimating the self-transformation probability of a query P(Q2=Q1|Q1), because we only included in the training data the pairs where the query is different from the correction.
    Page 4, “Clickthrough Data and Spelling Correction”
  4. The language model (the second factor) is a backoff bigram model trained on the tokenized form of one year of query logs, using maximum likelihood estimation with absolute discounting smoothing.
    Page 4, “The Baseline Speller System”

See all papers in Proc. ACL 2010 that mention model training.

See all papers in Proc. ACL that mention model training.

Back to top.

baseline systems

Appears in 3 sentences as: baseline system (1) baseline systems (2)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Results show that the system using the phrase-based error model outperforms significantly its baseline systems .
    Page 1, “Abstract”
  2. In particular, the speller system incorporating a phrase-based error model significantly outperforms its baseline systems .
    Page 2, “Introduction”
  3. One possible reason is that our baseline system , which does not use any error model learned from the clickthrough data, is already able to correct these basic, obvious spelling mistakes.
    Page 4, “Clickthrough Data and Spelling Correction”

See all papers in Proc. ACL 2010 that mention baseline systems.

See all papers in Proc. ACL that mention baseline systems.

Back to top.

significant improvement

Appears in 3 sentences as: significant improvement (2) significant improvements (1)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. Results show that the error models learned from clickthrough data lead to significant improvements on the task of query spelling correction.
    Page 2, “Introduction”
  2. Unfortunately, we found in our experiments that the pairs extracted using the method are too noisy for reliable error model training, even with a very tight threshold, and we did not see any significant improvement .
    Page 3, “Clickthrough Data and Spelling Correction”
  3. a new phrase-based error model, which leads to significant improvement in our spelling correction experiments.
    Page 9, “Experiments”

See all papers in Proc. ACL 2010 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.

Viterbi

Appears in 3 sentences as: Viterbi (3)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. The first pass uses the Viterbi algorithm to find the best C according to the model of Equations (2) and (3).
    Page 4, “The Baseline Speller System”
  2. In the second pass, the A-Star algorithm is used to find the 20-best corrections, using the Viterbi scores computed at each state in the first pass as heuristics.
    Page 4, “The Baseline Speller System”
  3. Figure 3: The dynamic programming algorithm for Viterbi bi-phrase segmentation.
    Page 6, “A Phrase-Based Error Model”

See all papers in Proc. ACL 2010 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

word-level

Appears in 3 sentences as: word-level (3)
In Learning Phrase-Based Spelling Error Models from Clickthrough Data
  1. (2006) extend the error model by capturing word-level similarities learned from query logs.
    Page 2, “Related Work”
  2. Furthermore, the word-level alignments between Q and C can most often be identified with little ambiguity.
    Page 5, “A Phrase-Based Error Model”
  3. Thus we restrict our attention to those phrase transformations consistent with a good word-level alignment.
    Page 5, “A Phrase-Based Error Model”

See all papers in Proc. ACL 2010 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.