Index of papers in Proc. ACL that mention
  • data sparsity
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Abstract
A plausible reason for such a performance improvement is the reduction in data sparsity .
Abstract
In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual, is addressed through the means of clustering.
Abstract
Experiments show that cluster based data sparsity reduction leads to performance better than sense based classification for sentiment analysis at document level.
Introduction
Data sparsity is the bane of Natural Language Processing (NLP) (Xue et al., 2005; Minkov et al., 2007).
Introduction
NLP applications innovatively handle data sparsity through various means.
Introduction
A special, but very common kind of data sparsity viz, word sparsity, can be addressed in one of the two obvious ways: 1) sparsity reduction through paradigmatically related words or 2) sparsity reduction through syntagmatically related words.
data sparsity is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Xu, Liheng and Liu, Kang and Lai, Siwei and Zhao, Jun
Abstract
However, syntax-based methods can only use discrete contextual information, which may suffer from data sparsity .
Abstract
Lexical semantic clue verifies whether a candidate term is related to the target product, and contextual semantic clue serves as a soft pattern miner to find candidates, which exploits semantics of each word in context so as to alleviate the data sparsity problem.
Experiments
As for SGW-TSVM, the features they used for the TSVM suffer from the data sparsity problem for infrequent terms.
Introduction
Therefore, such a representation often suffers from the data sparsity problem (Turian et al., 2010).
Introduction
This enables our method to be less sensitive to lexicon change, so that the data sparsity problem can be alleviated .
Related Work
As discussed in the first section, syntactic patterns often suffer from data sparsity .
Related Work
Thus, the data sparsity problem can be alleviated.
The Proposed Method
To alleviate the data sparsity problem, EB is first trained on a very large corpus3 (denoted by C), and then fine-tuned on the target review corpus R. Particularly, for phrasal product features, a statistic-based method in (Zhu et al., 2009) is used to detect noun phrases in R. Then, an Unfolding Recursive Autoencoder (Socher et al., 2011) is trained on C to obtain embedding vectors for noun phrases.
data sparsity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Abstract
Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks.
Introduction
Data sparsity and high dimensionality are the twin curses of statistical natural language processing (NLP).
Introduction
The negative effects of data sparsity have been well-documented in the NLP literature.
Introduction
Our technique is particularly well-suited to handling data sparsity because it is possible to improve performance on rare words by supplementing the training data with additional unannotated text containing more examples of the rare words.
Related Work
Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity .
Smoothing Natural Language Sequences
For supervised sequence-labeling problems in NLP, the most important “complicating factor” that we seek to avoid through smoothing is the data sparsity associated with word-based representations.
Smoothing Natural Language Sequences
Importantly, we seek distributional representations that will provide features that are common in both training and test data, to avoid data sparsity .
data sparsity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua
Abstract
This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences.
Introduction
However, this strategy is infeasible for many plausible triples due to data sparsity .
Introduction
Then how to use a smooth model to alleviate data sparsity for SP?
Introduction
Random walk models have been successfully applied to alleviate the data sparsity issue on collaborative filtering in recommender systems.
RSP: A Random Walk Model for SP
The damp factor d E (0, l), and its value mainly depends on the data sparsity level.
RSP: A Random Walk Model for SP
Experiments show it is efficient and effective to address data sparsity for SP.
data sparsity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kazama, Jun'ichi and De Saeger, Stijn and Kuroda, Kow and Murata, Masaki and Torisawa, Kentaro
Abstract
Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data.
Experiments
In this study, we combined two clustering results (denoted as “sl+s2” in the results), each of which (“sl” and “s2”) has 2,000 hidden classes.4 We included this method since clustering can be regarded as another way of treating data sparseness .
Introduction
In the NLP field, data sparseness has been recognized as a serious problem and tackled in the context of language modeling and supervised machine learning.
Introduction
has been no study that seriously dealt with data sparseness in the context of semantic similarity calculation.
Introduction
The data sparseness problem is usually solved by smoothing, regularization, margin maximization and so on (Chen and Goodman, 1998; Chen and Rosenfeld, 2000; Cortes and Vap-nik, 1995).
data sparsity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
The reason is that matrix factorization used in the paper can effectively solve the data sparseness and noise introduced by the machine translator simultaneously.
Experiments
Our proposed method (SMT + MF) can effectively solve the data sparseness and noise via matrix factorization.
Experiments
To further investigate the impact of the matrix factorization, one intuitive way is to expand the original questions with the translated words from other four languages, without considering the data sparseness and noise introduced by machine translator.
Our Approach
To tackle the data sparseness of question representation with the translated words, we hope to find two or more lower dimensional matrices whose product provides a good approximate to the original one via matrix factorization.
Our Approach
If we set a small value for Ap, the objective function behaves like the traditional NMF and the importance of data sparseness is emphasized; while a big value of Ap indicates Vp should be very closed to V1, and equation (3) aims to remove the noise introduced by statistical machine translation.
Our Approach
The objective function 0 defined in equation (4) performs data sparseness and noise removing simultaneously.
data sparsity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Jianguo and Brew, Chris
Conclusion and Future Work
Our experiments on verb classification have offered a class-based approach to alleviate data sparsity problem in parsing.
Integration of Syntactic and Lexical Information
Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a.
Integration of Syntactic and Lexical Information
to data sparsity .
Introduction
When the information about a verb type is not available or sufficient for us to draw firm conclusions about its usage, the information about the class to which the verb type belongs can compensate for it, addressing the pervasive problem of data sparsity in a wide range of NLP tasks, such as automatic extraction of subcategorization frames (Korhonen, 2002), semantic role labeling (Swier and Stevenson, 2004; Gildea and Juraf-sky, 2002), natural language generation for machine translation (Habash et al., 2003), and deriving predominant verb senses from unlabeled data (Lapata and Brew, 2004).
Related Work
Trying to overcome the problem of data sparsity , Schulte im Walde (2000) explores the additional use of selectional preference features by augmenting each syntactic slot with the concept to which its head noun belongs in an ontology (e.g.
Related Work
Although the problem of data sparsity is alleviated to certain extent (3), these features do not generally improve classification performance (Schulte im Walde, 2000; J oanis, 2002).
data sparsity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Guinaudeau, Camille and Strube, Michael
Abstract
The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.
Conclusions
Second, as it relies only on graph centrality, our model does not suffer from the computational complexity and data sparsity problems mentioned by Barzilay and Lapata (2008).
Conclusions
This can be easily done by adding edges in the projection graphs when sentences contain entities related from a discourse point of view while Lin et al.’s approach suffers from complexity and data sparsity problems similar to the entity grid model.
Introduction
However, their approach has some disadvantages which they point out themselves: data sparsity , domain dependence and computational complexity, especially in terms of feature space issues while building their model (Barzilay and Lapata (2008, p.8, p.10, p.30), Elsner and Charniak (2011, p.126, p.127)).
Introduction
The graph can easily span the entire text without leading to computational complexity and data sparsity problems.
Introduction
From this we conclude that a graph is an alternative to the entity grid model: it is computationally more tractable for modeling local coherence and does not suffer from data sparsity problems (Section 5).
data sparsity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Marelli, Marco and Zamparelli, Roberto and Baroni, Marco
Abstract
This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning.
Experimental setup
This result is of practical importance for distributional semantics, as it paves the way to address one of the main causes of data sparseness , and it confirms the usefulness of the compositional approach in a new domain.
Introduction
Not surprisingly, there is a strong correlation between word frequency and vector quality (Bullinaria and Levy, 2007), and since most words occur only once even in very large corpora (Baroni, 2009), DSMs suffer data sparseness .
Introduction
Compositional distributional semantic models (cDSMs) of word units aim at handling, compositionally, the high productivity of phrases and consequent data sparseness .
Introduction
Besides alleviating data sparseness problems, a system of this sort, that automatically induces the semantic contents of morphological processes, would also be of tremendous theoretical interest, given that the semantics of derivation is a central and challenging topic in linguistic morphology (Dowty, 1979; Lieber, 2004).
Related work
Morphological induction has recently received considerable attention since morphological analysis can mitigate data sparseness in domains such as parsing and machine translation (Goldberg and Tsarfaty, 2008; Lee, 2004).
data sparsity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng
Experiments
To alleviate the data sparseness problem, we only kept patterns appearing more than 10 times in the corpus for extracting paraphrase patterns.
Experiments
In other words, it seriously suffers from data sparseness .
Experiments
which is mainly because the data sparseness problem is more serious when extracting long patterns.
Proposed Method
However, we find that using only the MLE based probabilities can suffer from data sparseness .
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi
Results and Discussion
The curves suggest that the data sparseness problem could be the reason for the differences in performance.
Results and Discussion
For the latent variable approach, its curve demonstrates that it did not cause a severe data sparseness problem.
Results and Discussion
5 In addition, the training data of the English task is much smaller than for the Chinese task, which could make the models more sensitive to data sparseness .
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.
Abstract
This leads to a severe data sparsity problem even for moderately long sentences.
Abstract
Assume for this section is large (we address the data sparsity issue in §3.4).
Abstract
We now address the data sparsity problem, in particular that ’D(a:) can be very small, and therefore estimating d for each POS sequence separately can be problematic.3
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Paperno, Denis and Pham, Nghia The and Baroni, Marco
Compositional distributional semantics
Estimating tensors of this size runs into data sparseness issues already for less common transitive verbs.
Compositional distributional semantics
Besides losing the comparability of the semantic contribution of a word across syntactic contexts, we also worsen the data sparseness issues.
Evaluation
Evidently, the separately-trained subject and object matrices of plf, being less affected by data sparseness than the 3-way tensors of If, are better able to capture how verbs interact with their arguments.
The practical lexical function model
We expect a reasonably large corpus to feature many occurrences of a verb with a variety of subjects and a variety of objects (but not necessarily a variety of subjects with each of the objects as required by Grefenstette et al.’s training), allowing us to avoid the data sparseness issue.
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lassalle, Emmanuel and Denis, Pascal
Experiments
We observed that product-hierarchies did not performed well without cutting (especially when using longer sequences of indicators, because of data sparsity ) and could obtain scores lower than the single model.
Hierarchizing feature spaces
The small number of outputs of an indicator is required for practical reasons: if a category of pairs is too refined, the associated feature space will suffer from data sparsity .
Introduction
In effect, we want to learn the “best” subspaces for our different models: that is, subspaces that are neither too coarse (i.e., unlikely to separate the data well) nor too specific (i.e., prone to data sparseness and noise).
Modeling pairs
In practice, we have to cope with data sparsity : there will not be enough data to properly train a linear model on such a space.
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Abstract
Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
Experiments
Compared with the baseline, the Mo-ME method takes advantage of a large size monolingual training corpus and reduces the data sparseness problem.
Experiments
One problem is data sparseness with respect to collocations be-
Introduction
However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words.
data sparsity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Srivastava, Shashank and Hovy, Eduard
Abstract
Traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lex-emes and complexities of modeling semantic composition when dealing with structures larger than single lexical items.
Abstract
The framework subsumes issues such as differential compositional as well as non-compositional behavior of phrasal con-situents, and circumvents some problems of data sparsity by design.
Introduction
Such a framework for distributional models avoids the issue of data sparsity in learning of representations for larger linguistic structures.
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Boros, Tiberiu and Ion, Radu and Tufis, Dan
Abstract
Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories).
Abstract
The standard tagging methods, using such large tagsets, face serious data sparseness problems due to lack of statistical evidence, manifested by the non-robustness of the language models.
Abstract
The previously proposed methods still suffer from the same issue of data sparseness when applied to MSD tagging.
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Related Work on Data Sparsity in SMT
As traditional SMT systems treat all words as single tokens without considering their internal structure, major problems of data sparsity occur for less frequent tokens.
Related Work on Data Sparsity in SMT
Another source of data sparsity that occurs in all languages is proper names, which have been handled by using cognates or transliteration to improve translation (Knight and Graehl, 1998; Kondrak et al., 2003; Finch and Sumita, 2007), and more sophisticated methods for named entity translation that combine translation and transliteration have also been proposed (Al-Onaizan and Knight, 2002).
Related Work on Data Sparsity in SMT
We have enumerated these related works to demonstrate the myriad of data sparsity problems and proposed solutions.
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pantel, Patrick and Fuxman, Ariel
Abstract
Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model.
Association Model
Smoothing techniques can be useful to alleviate data sparsity problems common in statistical models.
Experimental Results
We expect our smoothing models to have much more impact on M SE (i.e., the tail) than on M SEW since head queries do not suffer from data sparsity .
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Park, Keun Chan and Jeong, Yoonjae and Myaeng, Sung Hyon
Conclusion and Future Work
In order to increase the coverage even further and reduce the errors in lexicon construction, i.e., verb classification, caused by data sparseness , we need to devise a different method, perhaps using domain specific resources.
Lexicon Construction
Other thematic roles did not perform well because of the data sparseness .
Lexicon Construction
Data sparseness affected the linguistic schemata as well.
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Boxwell, Stephen and Mehay, Dennis and Brew, Chris
Abstract
Most previously developed systems are CFG-based and make extensive use of a treepath feature, which suffers from data sparsity due to its use of explicit tree configurations.
Abstract
CCG affords ways to augment treepath-based features to overcome these data sparsity issues.
Potential Advantages to using CCG
Because there are a number of different treepaths that correspond to a single relation (figure 2), this approach can suffer from data sparsity .
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Abstract
In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes.
Conclusion
We conclude that even despite the large amounts of data used to train the large word-based model in our second experiment, class-based language models are still an effective tool to ease the effects of data sparsity .
Introduction
Class-based n—gram models are intended to help overcome this data sparsity problem by grouping words into equivalence classes rather than treating them as distinct words and thus reducing the number of parameters of the model (Brown et al., 1990).
data sparsity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: