Abstract | Supervised text classification algorithms require a large number of documents labeled by humans, that involve a labor-intensive and time consuming process. |
Abstract | We evaluate this approach to improve performance of text classification on three real world datasets. |
Introduction | In supervised text classification learning algorithms, the learner (a program) takes human labeled documents as input and learns a decision function that can classify a previously unseen document to one of the predefined classes. |
Introduction | In this paper, we propose a text classification algorithm based on Latent Dirichlet Allocation (LDA) (Blei et al., 2003) which does not need labeled documents. |
Introduction | (Blei et al., 2003) used LDA topics as features in text classification , but they use labeled documents while learning a classifier. |
Related Work | Several researchers have proposed semi-supervised text classification algorithms with the aim of reducing the time, effort and cost involved in labeling documents. |
Related Work | Semi-supervised text classification algorithms proposed in (Nigam et al., 2000), (J oachims, 1999), (Zhu and Ghahra—mani, 2002) and (Blum and Mitchell, 1998) are a few examples of this type. |
Related Work | Also a human annotator may discard or mislabel a polysemous word, which may affect the performance of a text classifier . |
Closing Remarks | The H-groups shown in Table 1 provide richer semantic descriptions of the domain than keywords do, and we noted potential applications for high-level summarization of a whole corpus, the creation of information extraction templates and finer- grained text classification and retrieval. |
Implementation | For broad topics it is desirable to perform f1ner- grained text classification and retrieval. |
Implementation | The alternation in V-groups contained by H-groups may reflect different beliefs and opinions which could be used for text classification and opinion mining. |