Index of papers in Proc. ACL 2009 that mention
  • data sparsity
Huang, Fei and Yates, Alexander
Abstract
Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks.
Introduction
Data sparsity and high dimensionality are the twin curses of statistical natural language processing (NLP).
Introduction
The negative effects of data sparsity have been well-documented in the NLP literature.
Introduction
Our technique is particularly well-suited to handling data sparsity because it is possible to improve performance on rare words by supplementing the training data with additional unannotated text containing more examples of the rare words.
Related Work
Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity .
Smoothing Natural Language Sequences
For supervised sequence-labeling problems in NLP, the most important “complicating factor” that we seek to avoid through smoothing is the data sparsity associated with word-based representations.
Smoothing Natural Language Sequences
Importantly, we seek distributional representations that will provide features that are common in both training and test data, to avoid data sparsity .
data sparsity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi
Results and Discussion
The curves suggest that the data sparseness problem could be the reason for the differences in performance.
Results and Discussion
For the latent variable approach, its curve demonstrates that it did not cause a severe data sparseness problem.
Results and Discussion
5 In addition, the training data of the English task is much smaller than for the Chinese task, which could make the models more sensitive to data sparseness .
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Boxwell, Stephen and Mehay, Dennis and Brew, Chris
Abstract
Most previously developed systems are CFG-based and make extensive use of a treepath feature, which suffers from data sparsity due to its use of explicit tree configurations.
Abstract
CCG affords ways to augment treepath-based features to overcome these data sparsity issues.
Potential Advantages to using CCG
Because there are a number of different treepaths that correspond to a single relation (figure 2), this approach can suffer from data sparsity .
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: