Abstract | A plausible reason for such a performance improvement is the reduction in data sparsity . |
Abstract | In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual, is addressed through the means of clustering. |
Abstract | Experiments show that cluster based data sparsity reduction leads to performance better than sense based classification for sentiment analysis at document level. |
Introduction | Data sparsity is the bane of Natural Language Processing (NLP) (Xue et al., 2005; Minkov et al., 2007). |
Introduction | NLP applications innovatively handle data sparsity through various means. |
Introduction | A special, but very common kind of data sparsity viz, word sparsity, can be addressed in one of the two obvious ways: 1) sparsity reduction through paradigmatically related words or 2) sparsity reduction through syntagmatically related words. |
Abstract | However, syntax-based methods can only use discrete contextual information, which may suffer from data sparsity . |
Abstract | Lexical semantic clue verifies whether a candidate term is related to the target product, and contextual semantic clue serves as a soft pattern miner to find candidates, which exploits semantics of each word in context so as to alleviate the data sparsity problem. |
Experiments | As for SGW-TSVM, the features they used for the TSVM suffer from the data sparsity problem for infrequent terms. |
Introduction | Therefore, such a representation often suffers from the data sparsity problem (Turian et al., 2010). |
Introduction | This enables our method to be less sensitive to lexicon change, so that the data sparsity problem can be alleviated . |
Related Work | As discussed in the first section, syntactic patterns often suffer from data sparsity . |
Related Work | Thus, the data sparsity problem can be alleviated. |
The Proposed Method | To alleviate the data sparsity problem, EB is first trained on a very large corpus3 (denoted by C), and then fine-tuned on the target review corpus R. Particularly, for phrasal product features, a statistic-based method in (Zhu et al., 2009) is used to detect noun phrases in R. Then, an Unfolding Recursive Autoencoder (Socher et al., 2011) is trained on C to obtain embedding vectors for noun phrases. |
Abstract | Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks. |
Introduction | Data sparsity and high dimensionality are the twin curses of statistical natural language processing (NLP). |
Introduction | The negative effects of data sparsity have been well-documented in the NLP literature. |
Introduction | Our technique is particularly well-suited to handling data sparsity because it is possible to improve performance on rare words by supplementing the training data with additional unannotated text containing more examples of the rare words. |
Related Work | Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity . |
Smoothing Natural Language Sequences | For supervised sequence-labeling problems in NLP, the most important “complicating factor” that we seek to avoid through smoothing is the data sparsity associated with word-based representations. |
Smoothing Natural Language Sequences | Importantly, we seek distributional representations that will provide features that are common in both training and test data, to avoid data sparsity . |
Abstract | This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences. |
Introduction | However, this strategy is infeasible for many plausible triples due to data sparsity . |
Introduction | Then how to use a smooth model to alleviate data sparsity for SP? |
Introduction | Random walk models have been successfully applied to alleviate the data sparsity issue on collaborative filtering in recommender systems. |
RSP: A Random Walk Model for SP | The damp factor d E (0, l), and its value mainly depends on the data sparsity level. |
RSP: A Random Walk Model for SP | Experiments show it is efficient and effective to address data sparsity for SP. |
Abstract | Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. |
Experiments | In this study, we combined two clustering results (denoted as “sl+s2” in the results), each of which (“sl” and “s2”) has 2,000 hidden classes.4 We included this method since clustering can be regarded as another way of treating data sparseness . |
Introduction | In the NLP field, data sparseness has been recognized as a serious problem and tackled in the context of language modeling and supervised machine learning. |
Introduction | has been no study that seriously dealt with data sparseness in the context of semantic similarity calculation. |
Introduction | The data sparseness problem is usually solved by smoothing, regularization, margin maximization and so on (Chen and Goodman, 1998; Chen and Rosenfeld, 2000; Cortes and Vap-nik, 1995). |
Experiments | The reason is that matrix factorization used in the paper can effectively solve the data sparseness and noise introduced by the machine translator simultaneously. |
Experiments | Our proposed method (SMT + MF) can effectively solve the data sparseness and noise via matrix factorization. |
Experiments | To further investigate the impact of the matrix factorization, one intuitive way is to expand the original questions with the translated words from other four languages, without considering the data sparseness and noise introduced by machine translator. |
Our Approach | To tackle the data sparseness of question representation with the translated words, we hope to find two or more lower dimensional matrices whose product provides a good approximate to the original one via matrix factorization. |
Our Approach | If we set a small value for Ap, the objective function behaves like the traditional NMF and the importance of data sparseness is emphasized; while a big value of Ap indicates Vp should be very closed to V1, and equation (3) aims to remove the noise introduced by statistical machine translation. |
Our Approach | The objective function 0 defined in equation (4) performs data sparseness and noise removing simultaneously. |
Conclusion and Future Work | Our experiments on verb classification have offered a class-based approach to alleviate data sparsity problem in parsing. |
Integration of Syntactic and Lexical Information | Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a. |
Integration of Syntactic and Lexical Information | to data sparsity . |
Introduction | When the information about a verb type is not available or sufficient for us to draw firm conclusions about its usage, the information about the class to which the verb type belongs can compensate for it, addressing the pervasive problem of data sparsity in a wide range of NLP tasks, such as automatic extraction of subcategorization frames (Korhonen, 2002), semantic role labeling (Swier and Stevenson, 2004; Gildea and Juraf-sky, 2002), natural language generation for machine translation (Habash et al., 2003), and deriving predominant verb senses from unlabeled data (Lapata and Brew, 2004). |
Related Work | Trying to overcome the problem of data sparsity , Schulte im Walde (2000) explores the additional use of selectional preference features by augmenting each syntactic slot with the concept to which its head noun belongs in an ontology (e.g. |
Related Work | Although the problem of data sparsity is alleviated to certain extent (3), these features do not generally improve classification performance (Schulte im Walde, 2000; J oanis, 2002). |
Abstract | The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems. |
Conclusions | Second, as it relies only on graph centrality, our model does not suffer from the computational complexity and data sparsity problems mentioned by Barzilay and Lapata (2008). |
Conclusions | This can be easily done by adding edges in the projection graphs when sentences contain entities related from a discourse point of view while Lin et al.’s approach suffers from complexity and data sparsity problems similar to the entity grid model. |
Introduction | However, their approach has some disadvantages which they point out themselves: data sparsity , domain dependence and computational complexity, especially in terms of feature space issues while building their model (Barzilay and Lapata (2008, p.8, p.10, p.30), Elsner and Charniak (2011, p.126, p.127)). |
Introduction | The graph can easily span the entire text without leading to computational complexity and data sparsity problems. |
Introduction | From this we conclude that a graph is an alternative to the entity grid model: it is computationally more tractable for modeling local coherence and does not suffer from data sparsity problems (Section 5). |
Abstract | This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. |
Experimental setup | This result is of practical importance for distributional semantics, as it paves the way to address one of the main causes of data sparseness , and it confirms the usefulness of the compositional approach in a new domain. |
Introduction | Not surprisingly, there is a strong correlation between word frequency and vector quality (Bullinaria and Levy, 2007), and since most words occur only once even in very large corpora (Baroni, 2009), DSMs suffer data sparseness . |
Introduction | Compositional distributional semantic models (cDSMs) of word units aim at handling, compositionally, the high productivity of phrases and consequent data sparseness . |
Introduction | Besides alleviating data sparseness problems, a system of this sort, that automatically induces the semantic contents of morphological processes, would also be of tremendous theoretical interest, given that the semantics of derivation is a central and challenging topic in linguistic morphology (Dowty, 1979; Lieber, 2004). |
Related work | Morphological induction has recently received considerable attention since morphological analysis can mitigate data sparseness in domains such as parsing and machine translation (Goldberg and Tsarfaty, 2008; Lee, 2004). |
Experiments | To alleviate the data sparseness problem, we only kept patterns appearing more than 10 times in the corpus for extracting paraphrase patterns. |
Experiments | In other words, it seriously suffers from data sparseness . |
Experiments | which is mainly because the data sparseness problem is more serious when extracting long patterns. |
Proposed Method | However, we find that using only the MLE based probabilities can suffer from data sparseness . |
Results and Discussion | The curves suggest that the data sparseness problem could be the reason for the differences in performance. |
Results and Discussion | For the latent variable approach, its curve demonstrates that it did not cause a severe data sparseness problem. |
Results and Discussion | 5 In addition, the training data of the English task is much smaller than for the Chinese task, which could make the models more sensitive to data sparseness . |
Abstract | This leads to a severe data sparsity problem even for moderately long sentences. |
Abstract | Assume for this section is large (we address the data sparsity issue in §3.4). |
Abstract | We now address the data sparsity problem, in particular that ’D(a:) can be very small, and therefore estimating d for each POS sequence separately can be problematic.3 |
Compositional distributional semantics | Estimating tensors of this size runs into data sparseness issues already for less common transitive verbs. |
Compositional distributional semantics | Besides losing the comparability of the semantic contribution of a word across syntactic contexts, we also worsen the data sparseness issues. |
Evaluation | Evidently, the separately-trained subject and object matrices of plf, being less affected by data sparseness than the 3-way tensors of If, are better able to capture how verbs interact with their arguments. |
The practical lexical function model | We expect a reasonably large corpus to feature many occurrences of a verb with a variety of subjects and a variety of objects (but not necessarily a variety of subjects with each of the objects as required by Grefenstette et al.’s training), allowing us to avoid the data sparseness issue. |
Experiments | We observed that product-hierarchies did not performed well without cutting (especially when using longer sequences of indicators, because of data sparsity ) and could obtain scores lower than the single model. |
Hierarchizing feature spaces | The small number of outputs of an indicator is required for practical reasons: if a category of pairs is too refined, the associated feature space will suffer from data sparsity . |
Introduction | In effect, we want to learn the “best” subspaces for our different models: that is, subspaces that are neither too coarse (i.e., unlikely to separate the data well) nor too specific (i.e., prone to data sparseness and noise). |
Modeling pairs | In practice, we have to cope with data sparsity : there will not be enough data to properly train a linear model on such a space. |
Abstract | Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words. |
Experiments | Compared with the baseline, the Mo-ME method takes advantage of a large size monolingual training corpus and reduces the data sparseness problem. |
Experiments | One problem is data sparseness with respect to collocations be- |
Introduction | However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words. |
Abstract | Traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lex-emes and complexities of modeling semantic composition when dealing with structures larger than single lexical items. |
Abstract | The framework subsumes issues such as differential compositional as well as non-compositional behavior of phrasal con-situents, and circumvents some problems of data sparsity by design. |
Introduction | Such a framework for distributional models avoids the issue of data sparsity in learning of representations for larger linguistic structures. |
Abstract | Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories). |
Abstract | The standard tagging methods, using such large tagsets, face serious data sparseness problems due to lack of statistical evidence, manifested by the non-robustness of the language models. |
Abstract | The previously proposed methods still suffer from the same issue of data sparseness when applied to MSD tagging. |
Related Work on Data Sparsity in SMT | As traditional SMT systems treat all words as single tokens without considering their internal structure, major problems of data sparsity occur for less frequent tokens. |
Related Work on Data Sparsity in SMT | Another source of data sparsity that occurs in all languages is proper names, which have been handled by using cognates or transliteration to improve translation (Knight and Graehl, 1998; Kondrak et al., 2003; Finch and Sumita, 2007), and more sophisticated methods for named entity translation that combine translation and transliteration have also been proposed (Al-Onaizan and Knight, 2002). |
Related Work on Data Sparsity in SMT | We have enumerated these related works to demonstrate the myriad of data sparsity problems and proposed solutions. |
Abstract | Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. |
Association Model | Smoothing techniques can be useful to alleviate data sparsity problems common in statistical models. |
Experimental Results | We expect our smoothing models to have much more impact on M SE (i.e., the tail) than on M SEW since head queries do not suffer from data sparsity . |
Conclusion and Future Work | In order to increase the coverage even further and reduce the errors in lexicon construction, i.e., verb classification, caused by data sparseness , we need to devise a different method, perhaps using domain specific resources. |
Lexicon Construction | Other thematic roles did not perform well because of the data sparseness . |
Lexicon Construction | Data sparseness affected the linguistic schemata as well. |
Abstract | Most previously developed systems are CFG-based and make extensive use of a treepath feature, which suffers from data sparsity due to its use of explicit tree configurations. |
Abstract | CCG affords ways to augment treepath-based features to overcome these data sparsity issues. |
Potential Advantages to using CCG | Because there are a number of different treepaths that correspond to a single relation (figure 2), this approach can suffer from data sparsity . |
Abstract | In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. |
Conclusion | We conclude that even despite the large amounts of data used to train the large word-based model in our second experiment, class-based language models are still an effective tool to ease the effects of data sparsity . |
Introduction | Class-based n—gram models are intended to help overcome this data sparsity problem by grouping words into equivalence classes rather than treating them as distinct words and thus reducing the number of parameters of the model (Brown et al., 1990). |