Abstract | Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. |
Experiments | In this study, we combined two clustering results (denoted as “sl+s2” in the results), each of which (“sl” and “s2”) has 2,000 hidden classes.4 We included this method since clustering can be regarded as another way of treating data sparseness . |
Introduction | In the NLP field, data sparseness has been recognized as a serious problem and tackled in the context of language modeling and supervised machine learning. |
Introduction | has been no study that seriously dealt with data sparseness in the context of semantic similarity calculation. |
Introduction | The data sparseness problem is usually solved by smoothing, regularization, margin maximization and so on (Chen and Goodman, 1998; Chen and Rosenfeld, 2000; Cortes and Vap-nik, 1995). |
Conclusion and Future Work | In order to increase the coverage even further and reduce the errors in lexicon construction, i.e., verb classification, caused by data sparseness , we need to devise a different method, perhaps using domain specific resources. |
Lexicon Construction | Other thematic roles did not perform well because of the data sparseness . |
Lexicon Construction | Data sparseness affected the linguistic schemata as well. |