Improving Text Simplification Language Modeling Using Unsimplified Text Data
Kauchak, David

Article Structure

Abstract

In this paper we examine language modeling for text simplification.

Introduction

An important component of many text-to-text translation systems is the language model which predicts the likelihood of a text sequence being produced in the output language.

Related Work

If we view the normal data as out-of-domain data, then the problem of combining simple and normal data is similar to the language model domain adaption problem (Suzuki and Gao, 2005), in particular cross-domain adaptation (Bellegarda, 2004) where a domain-specific model is improved by incorporating additional general data.

Corpus

We collected a data set from English Wikipedia and Simple English Wikipedia with the former representing normal English and the latter simple English.

Language Model Evaluation: Perplexity

To analyze the impact of data source on simple English language modeling, we trained language models on varying amounts of simple data, normal data, and a combination of the two.

Language Model Evaluation: Lexical Simplification

Currently, no automated methods exist for evaluating sentence-level or document-level text simplification systems and manual evaluation is time-consuming, expensive and has not been validated.

Why Does Unsimplified Data Help?

For both the perplexity experiments and the lexical simplification experiments, utilizing additional normal data resulted in large performance improvements; using all of the simple data available, performance is still significantly improved when combined with normal data.

Conclusions and Future Work

In the experiments above we have shown that on two different tasks utilizing additional normal data improves the performance of simple English language models.

Topics

language models

Appears in 53 sentences as: Language Model (2) Language model (3) language model (19) language modeling (10) language models (25)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. In this paper we examine language modeling for text simplification.
    Page 1, “Abstract”
  2. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model .
    Page 1, “Abstract”
  3. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each.
    Page 1, “Abstract”
  4. An important component of many text-to-text translation systems is the language model which predicts the likelihood of a text sequence being produced in the output language.
    Page 1, “Introduction”
  5. In some problem domains, such as machine translation, the translation is between two distinct languages and the language model can only be trained on data in the output language.
    Page 1, “Introduction”
  6. In these monolingual problems, text could be used from both the input and output domain to train a language model .
    Page 1, “Introduction”
  7. simplification where both simplified English text and normal English text are available for training a simple English language model .
    Page 1, “Introduction”
  8. Previous research has also shown other differences between simple and normal data sources that could impact language model performance including average number of syllables, reading
    Page 1, “Introduction”
  9. In addition, for some monolingual translation domains, it has been argued that it is not appropriate to train a language model using data from the input domain (Turner and Chamiak, 2005).
    Page 2, “Introduction”
  10. Finally, many recent text simplification systems have utilized language models trained only on simplified data (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012); improvements in simple language modeling could translate into improvements for these systems.
    Page 2, “Introduction”
  11. If we view the normal data as out-of-domain data, then the problem of combining simple and normal data is similar to the language model domain adaption problem (Suzuki and Gao, 2005), in particular cross-domain adaptation (Bellegarda, 2004) where a domain-specific model is improved by incorporating additional general data.
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

models trained

Appears in 22 sentences as: model trained (10) Models trained (1) models trained (16)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each.
    Page 1, “Abstract”
  2. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data.
    Page 1, “Abstract”
  3. Finally, many recent text simplification systems have utilized language models trained only on simplified data (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012); improvements in simple language modeling could translate into improvements for these systems.
    Page 2, “Introduction”
  4. Similarly for summarization, systems that have employed language models trained only on unsummarized text (Banko et al., 2000; Daume and Marcu, 2002).
    Page 2, “Related Work”
  5. Figure 1: Language model perplexities on the held-out test data for models trained on increasing amounts of data.
    Page 3, “Language Model Evaluation: Perplexity”
  6. As expected, when trained on the same amount of data, the language models trained on simple data perform significantly better than language models trained on normal data.
    Page 3, “Language Model Evaluation: Perplexity”
  7. The perplexity for the simple-ALL+norma1 model, which starts with all available simple data, continues to improve as normal data is added resulting in a 23% improvement over the model trained with only simple data (from a perplexity of 129 down to 100).
    Page 3, “Language Model Evaluation: Perplexity”
  8. Each line represents a model trained on a different amount of simple data as normal data is added.
    Page 4, “Language Model Evaluation: Perplexity”
  9. To better understand how the amount of simple and normal data impacts perplexity, Figure 2 shows perplexity scores for models trained on varying amounts of simple data as we add increasing amounts of normal data.
    Page 4, “Language Model Evaluation: Perplexity”
  10. Models trained on less simple data achieved larger performance increases than those models trained on more simple data.
    Page 4, “Language Model Evaluation: Perplexity”
  11. For example, the simple-only model trained on 250K sentences achieves a perplexity of approximately 150.
    Page 4, “Language Model Evaluation: Perplexity”

See all papers in Proc. ACL 2013 that mention models trained.

See all papers in Proc. ACL that mention models trained.

Back to top.

n-grams

Appears in 17 sentences as: n-grams (20)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams .
    Page 1, “Abstract”
  2. At the word level 96% of the simple words are found in the normal corpus and even for n-grams as large as 5, more than half of the n-grams can be found in the normal text.
    Page 1, “Introduction”
  3. This extra information may help with data sparsity, providing better estimates for rare and unseen n-grams .
    Page 1, “Introduction”
  4. On the other hand, there is still only modest overlap between the sentences for longer n-grams , particularly given that the corpus is sentence-aligned and that 27% of the sentence pairs in this aligned data set are identical.
    Page 1, “Introduction”
  5. Instead, we see that the normal text has more varied language and contains more n-grams .
    Page 1, “Introduction”
  6. Table 1: The proportion of n-grams that overlap in a corpus of 137K sentence-aligned pairs from Simple English Wikipedia and English Wikipedia.
    Page 2, “Introduction”
  7. 6.1 More n-grams
    Page 6, “Why Does Unsimplified Data Help?”
  8. Table 3: Proportion of n-grams in the test sets that occur in the simple and normal training data sets.
    Page 7, “Why Does Unsimplified Data Help?”
  9. We hypothesize that the key benefit of additional normal data is access to more n-gram counts and therefore better probability estimation, particularly for n-grams in the simple corpus that are unseen or have low frequency.
    Page 7, “Why Does Unsimplified Data Help?”
  10. For n-grams that have never been seen before, the normal data provides some estimate from English text.
    Page 7, “Why Does Unsimplified Data Help?”
  11. For n-grams that have been seen but are rare, the additional normal data can help provide better probability estimates.
    Page 7, “Why Does Unsimplified Data Help?”

See all papers in Proc. ACL 2013 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

domain adaptation

Appears in 9 sentences as: domain adaptation (7) domain adaptations (1) domain adaption (1)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. If we view the normal data as out-of-domain data, then the problem of combining simple and normal data is similar to the language model domain adaption problem (Suzuki and Gao, 2005), in particular cross-domain adaptation (Bellegarda, 2004) where a domain-specific model is improved by incorporating additional general data.
    Page 2, “Related Work”
  2. guage model domain adaptation problem for text simplification.
    Page 2, “Related Work”
  3. Pan and Yang (2010) provide a survey on the related problem of domain adaptation for machine learning (also referred to as “transfer learning”), which utilizes similar techniques.
    Page 2, “Related Work”
  4. In this paper, we explore some basic adaptation techniques, however this paper is not a comparison of domain adaptation techniques for language modeling.
    Page 2, “Related Work”
  5. Previous domain adaptation research is complementary to our experiments and could be explored in the future for additional performance improvements.
    Page 2, “Related Work”
  6. This approach has the benefit of simplicity, however, better performance for combining related corpora has been seen by domain adaptation techniques which combine the data in more structured ways (Bacchiani and Roark, 2003).
    Page 4, “Language Model Evaluation: Perplexity”
  7. Our goal for this paper is not to explore domain adaptation techniques, but to determine if normal data is useful for the simple language modeling task.
    Page 4, “Language Model Evaluation: Perplexity”
  8. if domain adaptation techniques may be useful, we also investigated a linearly interpolated language model.
    Page 4, “Language Model Evaluation: Perplexity”
  9. Many other domain adaptations techniques exist and may produce language models with better performance.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2013 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

n-gram

Appears in 8 sentences as: n-gram (9)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. Table 1 shows the n-gram overlap proportions in a sentence aligned data set of 137K sentence pairs from aligning Simple English Wikipedia and English Wikipedia articles (Coster and Kauchak, 2011a).1 The data highlights two conflicting views: does the benefit of additional data outweigh the problem of the source of the data?
    Page 1, “Introduction”
  2. n-gram size: 1 2 3 4 5 simple in normal 0.96 0.80 0.68 0.61 0.55 normal in simple 0.87 0.68 0.58 0.51 0.46
    Page 2, “Introduction”
  3. trained using a smoothed version of the maximum likelihood estimate for an n-gram .
    Page 7, “Why Does Unsimplified Data Help?”
  4. count(bc) where count(-) is the number of times the n-gram occurs in the training corpus.
    Page 7, “Why Does Unsimplified Data Help?”
  5. For interpolated and backoff n-gram models, these counts are smoothed based on the probabilities of lower order n-gram models, which are inturn calculated based on counts from the corpus.
    Page 7, “Why Does Unsimplified Data Help?”
  6. We hypothesize that the key benefit of additional normal data is access to more n-gram counts and therefore better probability estimation, particularly for n-grams in the simple corpus that are unseen or have low frequency.
    Page 7, “Why Does Unsimplified Data Help?”
  7. To partially validate this hypothesis, we examined the n-gram overlap between the n-grams in the training data and the n-grams in the test sets from the two tasks.
    Page 7, “Why Does Unsimplified Data Help?”
  8. For all n-gram sizes the normal data contained more test set n-grams than the simple data.
    Page 7, “Why Does Unsimplified Data Help?”

See all papers in Proc. ACL 2013 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

machine translation

Appears in 4 sentences as: machine translation (4)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. In some problem domains, such as machine translation , the translation is between two distinct languages and the language model can only be trained on data in the output language.
    Page 1, “Introduction”
  2. Adaptation techniques have been shown to improve language modeling performance based on perplexity (Rosenfeld, 1996) and in application areas such as speech transcription (Bacchiani and Roark, 2003) and machine translation (Zhao et al., 2004), though no previous research has examined the lan-
    Page 2, “Related Work”
  3. Many recent statistical simplification techniques build upon models from machine translation and utilize a simple language model during simplifica-tiorfldecoding both in English (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012) and in other languages (Specia, 2010).
    Page 2, “Related Work”
  4. In machine translation , improved language models have resulted in significant improvements in translation performance (Brants et al., 2007).
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2013 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

unigrams

Appears in 4 sentences as: unigram (1) unigrams (4)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. This is particularly important for unigrams (i.e.
    Page 7, “Why Does Unsimplified Data Help?”
  2. Table 3 shows the percentage of unigrams , bigrams and trigrams from the two test sets that are found in the simple and normal training data.
    Page 7, “Why Does Unsimplified Data Help?”
  3. Even at the unigram level, the normal data contained significantly more of the test set unigrams than the simple data.
    Page 7, “Why Does Unsimplified Data Help?”
  4. For example, adding the simple data to the normal data only increases the number of seen unigrams by 0.2%, representing only about 600 new words.
    Page 7, “Why Does Unsimplified Data Help?”

See all papers in Proc. ACL 2013 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

development set

Appears in 3 sentences as: development set (3)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. The data set contains a development set of 300 examples and a test set of 1710 examples.3 For our experiments, we evaluated the models on the test set.
    Page 5, “Language Model Evaluation: Lexical Simplification”
  2. The best lambda was chosen based on a linear search optimized on the SemEval 2012 development set .
    Page 6, “Language Model Evaluation: Lexical Simplification”
  3. For the simplification task, the optimal lambda value determined on the development set was 0.98, with a very strong bias towards the simple model.
    Page 8, “Why Does Unsimplified Data Help?”

See all papers in Proc. ACL 2013 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

sentence pairs

Appears in 3 sentences as: sentence pairs (3)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. Table 1 shows the n-gram overlap proportions in a sentence aligned data set of 137K sentence pairs from aligning Simple English Wikipedia and English Wikipedia articles (Coster and Kauchak, 2011a).1 The data highlights two conflicting views: does the benefit of additional data outweigh the problem of the source of the data?
    Page 1, “Introduction”
  2. On the other hand, there is still only modest overlap between the sentences for longer n-grams, particularly given that the corpus is sentence-aligned and that 27% of the sentence pairs in this aligned data set are identical.
    Page 1, “Introduction”
  3. The resulting data set contains 150K aligned simple-normal sentence pairs .
    Page 7, “Why Does Unsimplified Data Help?”

See all papers in Proc. ACL 2013 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

translation tasks

Appears in 3 sentences as: translation task (1) translation tasks (3)
In Improving Text Simplification Language Modeling Using Unsimplified Text Data
  1. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model.
    Page 1, “Abstract”
  2. text compression, text simplification and summarization) can be viewed as monolingual translation tasks , translating between text variations within a single language.
    Page 1, “Introduction”
  3. This is not the case for all monolingual translation tasks (Knight and Marcu, 2002; Cohn and Lapata, 2009).
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention translation tasks.

See all papers in Proc. ACL that mention translation tasks.

Back to top.