Index of papers in Proc. ACL 2008 that mention
  • data sparsity
Li, Jianguo and Brew, Chris
Conclusion and Future Work
Our experiments on verb classification have offered a class-based approach to alleviate data sparsity problem in parsing.
Integration of Syntactic and Lexical Information
Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a.
Integration of Syntactic and Lexical Information
to data sparsity .
Introduction
When the information about a verb type is not available or sufficient for us to draw firm conclusions about its usage, the information about the class to which the verb type belongs can compensate for it, addressing the pervasive problem of data sparsity in a wide range of NLP tasks, such as automatic extraction of subcategorization frames (Korhonen, 2002), semantic role labeling (Swier and Stevenson, 2004; Gildea and Juraf-sky, 2002), natural language generation for machine translation (Habash et al., 2003), and deriving predominant verb senses from unlabeled data (Lapata and Brew, 2004).
Related Work
Trying to overcome the problem of data sparsity , Schulte im Walde (2000) explores the additional use of selectional preference features by augmenting each syntactic slot with the concept to which its head noun belongs in an ontology (e.g.
Related Work
Although the problem of data sparsity is alleviated to certain extent (3), these features do not generally improve classification performance (Schulte im Walde, 2000; J oanis, 2002).
data sparsity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Abstract
Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
Experiments
Compared with the baseline, the Mo-ME method takes advantage of a large size monolingual training corpus and reduces the data sparseness problem.
Experiments
One problem is data sparseness with respect to collocations be-
Introduction
However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words.
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng
Experiments
To alleviate the data sparseness problem, we only kept patterns appearing more than 10 times in the corpus for extracting paraphrase patterns.
Experiments
In other words, it seriously suffers from data sparseness .
Experiments
which is mainly because the data sparseness problem is more serious when extracting long patterns.
Proposed Method
However, we find that using only the MLE based probabilities can suffer from data sparseness .
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Abstract
In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes.
Conclusion
We conclude that even despite the large amounts of data used to train the large word-based model in our second experiment, class-based language models are still an effective tool to ease the effects of data sparsity .
Introduction
Class-based n—gram models are intended to help overcome this data sparsity problem by grouping words into equivalence classes rather than treating them as distinct words and thus reducing the number of parameters of the model (Brown et al., 1990).
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: