Generalized Character-Level Spelling Error Correction
Farra, Noura and Tomeh, Nadi and Rozovskaya, Alla and Habash, Nizar

Article Structure

Abstract

We present a generalized discriminative model for spelling error correction which targets character-level transformations.

Introduction

Spelling error correction is a longstanding Natural Language Processing (NLP) problem, and it has recently become especially relevant because of the many potential applications to the large amount of informal and unedited text generated online, including web forums, tweets, blogs, and email.

Related Work

Most earlier work on automatic error correction addressed spelling errors in English and built models of correct usage on native English data (Ku-kich, 1992; Golding and Roth, 1999; Carlson and Fette, 2007; Banko and Brill, 2001).

The GSEC Approach

3.1 Modeling Spelling Correction at the Character Level

Experiments

4.1 Model Evaluation

Conclusions

We showed that a generalized character-level spelling error correction model can improve spelling error correction on Egyptian Arabic data.

Topics

word-level

Appears in 5 sentences as: word-level (5)
In Generalized Character-Level Spelling Error Correction
  1. While operating at the character level, the model makes use of word-level and contextual information.
    Page 1, “Abstract”
  2. Discriminative models have been proposed at the word-level for error correction (Duan et al., 2012) and for error detection (Habash and Roth, 2011).
    Page 2, “Related Work”
  3. We implemented another approach for error correction based on a word-level maximum likelihood model.
    Page 3, “The GSEC Approach”
  4. The word-error-rate WER metric is computed by summing the total number of word-level substitution errors, insertion errors, and deletion errors in the output, and dividing by the number of words in the reference.
    Page 4, “Experiments”
  5. In the future, we plan to extend the model to use word-level language models to select between top character predictions in the output.
    Page 5, “Conclusions”

See all papers in Proc. ACL 2014 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.

statistically significant

Appears in 3 sentences as: statistically significant (3)
In Generalized Character-Level Spelling Error Correction
  1. Rows marked with an asterisk (*) are statistically significant compared to CEC (for the first half of the table) or CEC+MLE (for the second half of the table), with p < 0.05.
    Page 4, “Experiments”
  2. In fact, CEC+MLE and GSEC+MLE perform similarly (p = 0.36, not statistically significant ).
    Page 4, “Experiments”
  3. The two results are statistically significant (p < 0.0001) with respect to CBC and CEC+MLE respectively.
    Page 4, “Experiments”

See all papers in Proc. ACL 2014 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.