Index of papers in Proc. ACL that mention

data sparsity

Seen in text as:

data sparsity (68)
data sparseness (40)

Seen in 111 sentences in 22 papers.

1. The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	A plausible reason for such a performance improvement is the reduction in data sparsity .
Abstract	In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual, is addressed through the means of clustering.
Abstract	Experiments show that cluster based data sparsity reduction leads to performance better than sense based classification for sentiment analysis at document level.
Introduction	Data sparsity is the bane of Natural Language Processing (NLP) (Xue et al., 2005; Minkov et al., 2007).
Introduction	NLP applications innovatively handle data sparsity through various means.
Introduction	A special, but very common kind of data sparsity viz, word sparsity, can be addressed in one of the two obvious ways: 1) sparsity reduction through paradigmatically related words or 2) sparsity reduction through syntagmatically related words.

data sparsity is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

2. Product Feature Mining: Semantic Clues versus Syntactic Constituents

Xu, Liheng and Liu, Kang and Lai, Siwei and Zhao, Jun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	However, syntax-based methods can only use discrete contextual information, which may suffer from data sparsity .
Abstract	Lexical semantic clue verifies whether a candidate term is related to the target product, and contextual semantic clue serves as a soft pattern miner to find candidates, which exploits semantics of each word in context so as to alleviate the data sparsity problem.
Experiments	As for SGW-TSVM, the features they used for the TSVM suffer from the data sparsity problem for infrequent terms.
Introduction	Therefore, such a representation often suffers from the data sparsity problem (Turian et al., 2010).
Introduction	This enables our method to be less sensitive to lexicon change, so that the data sparsity problem can be alleviated .
Related Work	As discussed in the first section, syntactic patterns often suffer from data sparsity .
Related Work	Thus, the data sparsity problem can be alleviated.
The Proposed Method	To alleviate the data sparsity problem, EB is first trained on a very large corpus3 (denoted by C), and then fine-tuned on the target review corpus R. Particularly, for phrasal product features, a statistic-based method in (Zhu et al., 2009) is used to detect noun phrases in R. Then, an Unfolding Recursive Autoencoder (Socher et al., 2011) is trained on C to obtain embedding vectors for noun phrases.

data sparsity is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

3. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks.
Introduction	Data sparsity and high dimensionality are the twin curses of statistical natural language processing (NLP).
Introduction	The negative effects of data sparsity have been well-documented in the NLP literature.
Introduction	Our technique is particularly well-suited to handling data sparsity because it is possible to improve performance on rare words by supplementing the training data with additional unannotated text containing more examples of the rare words.
Related Work	Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity .
Smoothing Natural Language Sequences	For supervised sequence-labeling problems in NLP, the most important “complicating factor” that we seek to avoid through smoothing is the data sparsity associated with word-based representations.
Smoothing Natural Language Sequences	Importantly, we seek distributional representations that will provide features that are common in both training and test data, to avoid data sparsity .

data sparsity is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

4. A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences.
Introduction	However, this strategy is infeasible for many plausible triples due to data sparsity .
Introduction	Then how to use a smooth model to alleviate data sparsity for SP?
Introduction	Random walk models have been successfully applied to alleviate the data sparsity issue on collaborative filtering in recommender systems.
RSP: A Random Walk Model for SP	The damp factor d E (0, l), and its value mainly depends on the data sparsity level.
RSP: A Random Walk Model for SP	Experiments show it is efficient and effective to address data sparsity for SP.

data sparsity is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. A Bayesian Method for Robust Estimation of Distributional Similarities

Kazama, Jun'ichi and De Saeger, Stijn and Kuroda, Kow and Murata, Masaki and Torisawa, Kentaro

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data.
Experiments	In this study, we combined two clustering results (denoted as “sl+s2” in the results), each of which (“sl” and “s2”) has 2,000 hidden classes.4 We included this method since clustering can be regarded as another way of treating data sparseness .
Introduction	In the NLP field, data sparseness has been recognized as a serious problem and tackled in the context of language modeling and supervised machine learning.
Introduction	has been no study that seriously dealt with data sparseness in the context of semantic similarity calculation.
Introduction	The data sparseness problem is usually solved by smoothing, regularization, margin maximization and so on (Chen and Goodman, 1998; Chen and Rosenfeld, 2000; Cortes and Vap-nik, 1995).

data sparsity is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The reason is that matrix factorization used in the paper can effectively solve the data sparseness and noise introduced by the machine translator simultaneously.
Experiments	Our proposed method (SMT + MF) can effectively solve the data sparseness and noise via matrix factorization.
Experiments	To further investigate the impact of the matrix factorization, one intuitive way is to expand the original questions with the translated words from other four languages, without considering the data sparseness and noise introduced by machine translator.
Our Approach	To tackle the data sparseness of question representation with the translated words, we hope to find two or more lower dimensional matrices whose product provides a good approximate to the original one via matrix factorization.
Our Approach	If we set a small value for Ap, the objective function behaves like the traditional NMF and the importance of data sparseness is emphasized; while a big value of Ap indicates Vp should be very closed to V1, and equation (3) aims to remove the noise introduced by statistical machine translation.
Our Approach	The objective function 0 defined in equation (4) performs data sparseness and noise removing simultaneously.

data sparsity is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

7. Which Are the Best Features for Automatic Verb Classification

Li, Jianguo and Brew, Chris

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	Our experiments on verb classification have offered a class-based approach to alleviate data sparsity problem in parsing.
Integration of Syntactic and Lexical Information	Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a.
Integration of Syntactic and Lexical Information	to data sparsity .
Introduction	When the information about a verb type is not available or sufficient for us to draw firm conclusions about its usage, the information about the class to which the verb type belongs can compensate for it, addressing the pervasive problem of data sparsity in a wide range of NLP tasks, such as automatic extraction of subcategorization frames (Korhonen, 2002), semantic role labeling (Swier and Stevenson, 2004; Gildea and Juraf-sky, 2002), natural language generation for machine translation (Habash et al., 2003), and deriving predominant verb senses from unlabeled data (Lapata and Brew, 2004).
Related Work	Trying to overcome the problem of data sparsity , Schulte im Walde (2000) explores the additional use of selectional preference features by augmenting each syntactic slot with the concept to which its head noun belongs in an ontology (e.g.
Related Work	Although the problem of data sparsity is alleviated to certain extent (3), these features do not generally improve classification performance (Schulte im Walde, 2000; J oanis, 2002).

data sparsity is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

8. Graph-based Local Coherence Modeling

Guinaudeau, Camille and Strube, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.
Conclusions	Second, as it relies only on graph centrality, our model does not suffer from the computational complexity and data sparsity problems mentioned by Barzilay and Lapata (2008).
Conclusions	This can be easily done by adding edges in the projection graphs when sentences contain entities related from a discourse point of view while Lin et al.’s approach suffers from complexity and data sparsity problems similar to the entity grid model.
Introduction	However, their approach has some disadvantages which they point out themselves: data sparsity , domain dependence and computational complexity, especially in terms of feature space issues while building their model (Barzilay and Lapata (2008, p.8, p.10, p.30), Elsner and Charniak (2011, p.126, p.127)).
Introduction	The graph can easily span the entire text without leading to computational complexity and data sparsity problems.
Introduction	From this we conclude that a graph is an alternative to the entity grid model: it is computationally more tractable for modeling local coherence and does not suffer from data sparsity problems (Section 5).

data sparsity is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

9. Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

Lazaridou, Angeliki and Marelli, Marco and Zamparelli, Roberto and Baroni, Marco

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning.
Experimental setup	This result is of practical importance for distributional semantics, as it paves the way to address one of the main causes of data sparseness , and it confirms the usefulness of the compositional approach in a new domain.
Introduction	Not surprisingly, there is a strong correlation between word frequency and vector quality (Bullinaria and Levy, 2007), and since most words occur only once even in very large corpora (Baroni, 2009), DSMs suffer data sparseness .
Introduction	Compositional distributional semantic models (cDSMs) of word units aim at handling, compositionally, the high productivity of phrases and consequent data sparseness .
Introduction	Besides alleviating data sparseness problems, a system of this sort, that automatically induces the semantic contents of morphological processes, would also be of tremendous theoretical interest, given that the semantics of derivation is a central and challenging topic in linguistic morphology (Dowty, 1979; Lieber, 2004).
Related work	Morphological induction has recently received considerable attention since morphological analysis can mitigate data sparseness in domains such as parsing and machine translation (Goldberg and Tsarfaty, 2008; Lee, 2004).

data sparsity is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

10. Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora

Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To alleviate the data sparseness problem, we only kept patterns appearing more than 10 times in the corpus for extracting paraphrase patterns.
Experiments	In other words, it seriously suffers from data sparseness .
Experiments	which is mainly because the data sparseness problem is more serious when extracting long patterns.
Proposed Method	However, we find that using only the MLE based probabilities can suffer from data sparseness .

data sparsity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information

Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results and Discussion	The curves suggest that the data sparseness problem could be the reason for the differences in performance.
Results and Discussion	For the latent variable approach, its curve demonstrates that it did not cause a severe data sparseness problem.
Results and Discussion	5 In addition, the training data of the English task is much smaller than for the Chinese task, which could make the models more sensitive to data sparseness .

data sparsity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This leads to a severe data sparsity problem even for moderately long sentences.
Abstract	Assume for this section is large (we address the data sparsity issue in §3.4).
Abstract	We now address the data sparsity problem, in particular that ’D(a:) can be very small, and therefore estimating d for each POS sequence separately can be problematic.3

data sparsity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. A practical and linguistically-motivated approach to compositional distributional semantics

Paperno, Denis and Pham, Nghia The and Baroni, Marco

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Compositional distributional semantics	Estimating tensors of this size runs into data sparseness issues already for less common transitive verbs.
Compositional distributional semantics	Besides losing the comparability of the semantic contribution of a word across syntactic contexts, we also worsen the data sparseness issues.
Evaluation	Evidently, the separately-trained subject and object matrices of plf, being less affected by data sparseness than the 3-way tensors of If, are better able to capture how verbs interact with their arguments.
The practical lexical function model	We expect a reasonably large corpus to feature many occurrences of a verb with a variety of subjects and a variety of objects (but not necessarily a variety of subjects with each of the objects as required by Grefenstette et al.’s training), allowing us to avoid the data sparseness issue.

data sparsity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Improving pairwise coreference models through feature space hierarchy learning

Lassalle, Emmanuel and Denis, Pascal

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We observed that product-hierarchies did not performed well without cutting (especially when using longer sequences of indicators, because of data sparsity ) and could obtain scores lower than the single model.
Hierarchizing feature spaces	The small number of outputs of an indicator is required for practical reasons: if a category of pairs is too refined, the associated feature space will suffer from data sparsity .
Introduction	In effect, we want to learn the “best” subspaces for our different models: that is, subspaces that are neither too coarse (i.e., unlikely to separate the data well) nor too specific (i.e., prone to data sparseness and noise).
Modeling pairs	In practice, we have to cope with data sparsity : there will not be enough data to properly train a linear model on such a space.

data sparsity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
Experiments	Compared with the baseline, the Mo-ME method takes advantage of a large size monolingual training corpus and reduces the data sparseness problem.
Experiments	One problem is data sparseness with respect to collocations be-
Introduction	However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words.

data sparsity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Vector space semantics with frequency-driven motifs

Srivastava, Shashank and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lex-emes and complexities of modeling semantic composition when dealing with structures larger than single lexical items.
Abstract	The framework subsumes issues such as differential compositional as well as non-compositional behavior of phrasal con-situents, and circumvents some problems of data sparsity by design.
Introduction	Such a framework for distributional models avoids the issue of data sparsity in learning of representations for larger linguistic structures.

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

Boros, Tiberiu and Ion, Radu and Tufis, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories).
Abstract	The standard tagging methods, using such large tagsets, face serious data sparseness problems due to lack of statistical evidence, manifested by the non-robustness of the language models.
Abstract	The previously proposed methods still suffer from the same issue of data sparseness when applied to MSD tagging.

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Machine Translation without Words through Substring Alignment

Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work on Data Sparsity in SMT	As traditional SMT systems treat all words as single tokens without considering their internal structure, major problems of data sparsity occur for less frequent tokens.
Related Work on Data Sparsity in SMT	Another source of data sparsity that occurs in all languages is proper names, which have been handled by using cognates or transliteration to improve translation (Knight and Graehl, 1998; Kondrak et al., 2003; Finch and Sumita, 2007), and more sophisticated methods for named entity translation that combine translation and transliteration have also been proposed (Al-Onaizan and Knight, 2002).
Related Work on Data Sparsity in SMT	We have enumerated these related works to demonstrate the myriad of data sparsity problems and proposed solutions.

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. Jigs and Lures: Associating Web Queries with Structured Entities

Pantel, Patrick and Fuxman, Ariel

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model.
Association Model	Smoothing techniques can be useful to alleviate data sparsity problems common in statistical models.
Experimental Results	We expect our smoothing models to have much more impact on M SE (i.e., the tail) than on M SEW since head queries do not suffer from data sparsity .

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. Detecting Experiences from Weblogs

Park, Keun Chan and Jeong, Yoonjae and Myaeng, Sung Hyon

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	In order to increase the coverage even further and reduce the errors in lexicon construction, i.e., verb classification, caused by data sparseness , we need to devise a different method, perhaps using domain specific resources.
Lexicon Construction	Other thematic roles did not perform well because of the data sparseness .
Lexicon Construction	Data sparseness affected the linguistic schemata as well.

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Brutus: A Semantic Role Labeling System Incorporating CCG, CFG, and Dependency Features

Boxwell, Stephen and Mehay, Dennis and Brew, Chris

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Most previously developed systems are CFG-based and make extensive use of a treepath feature, which suffers from data sparsity due to its use of explicit tree configurations.
Abstract	CCG affords ways to augment treepath-based features to overcome these data sparsity issues.
Potential Advantages to using CCG	Because there are a number of different treepaths that correspond to a single relation (figure 2), this approach can suffer from data sparsity .

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CCG (39)
semantic role (25)
treebank (13)

22. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation

Uszkoreit, Jakob and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes.
Conclusion	We conclude that even despite the large amounts of data used to train the large word-based model in our second experiment, class-based language models are still an effective tool to ease the effects of data sparsity .
Introduction	Class-based n—gram models are intended to help overcome this data sparsity problem by grouping words into equivalence classes rather than treating them as distinct words and thus reducing the number of parameters of the model (Brown et al., 1990).

data sparsity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: