Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
Fukumoto, Fumiyo and Suzuki, Yoshimi and Matsuyoshi, Suguru

Article Structure

Abstract

This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data.

Introduction

Text classification using machine learning (ML) techniques with a small number of labeled data has become more important with the rapid increase in volume of online documents.

Framework of the System

The MCDC method involves category error correction, i.e.

Experiments

3.1 Experimental setup

Conclusion

The research described in this paper involved text classification using positive and unlabeled data.

Topics

SVM

Appears in 23 sentences as: SVM (25)
In Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
  1. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines ( SVM ).
    Page 1, “Abstract”
  2. uses soft-margin SVM as the underlying classifiers (Liu et al., 2003).
    Page 1, “Introduction”
  3. They reported that the results were comparable to the current state-of-the-art biased SVM method.
    Page 1, “Introduction”
  4. Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data, and add the classification results to the training data.
    Page 1, “Introduction”
  5. The difference is that before adding the classification results, we applied the MisClassified data Detection and Correction (MCDC) technique to the results of SVM learning in order to improve classification accuracy obtained by the final classifiers.
    Page 1, “Introduction”
  6. As error candidates, we focus on support vectors (SVs) extracted from the training documents by SVM .
    Page 2, “Framework of the System”
  7. Training by SVM is performed to find the optimal hyperplane consisting of SVs, and only the SVs affect the performance.
    Page 2, “Framework of the System”
  8. We set these selected documents to negative training documents (N1), and apply SVM to learn classifiers.
    Page 2, “Framework of the System”
  9. Next, we apply the MCDC technique to the results of SVM learning.
    Page 2, “Framework of the System”
  10. For the result of correction (R001, we train SVM classifiers, and classify the remaining unlabeled data (U \ N1).
    Page 2, “Framework of the System”
  11. For the result of classification, we randomly select positive (0P1) and negative (0N1) documents classified by SVM and add to the SVM training data (R01).
    Page 2, “Framework of the System”

See all papers in Proc. ACL 2013 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

unlabeled data

Appears in 10 sentences as: unlabeled data (10)
In Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
  1. This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data .
    Page 1, “Abstract”
  2. Several authors have attempted to improve classification accuracy using only positive and unlabeled data (Yu et al., 2002; Ho et al., 2011).
    Page 1, “Introduction”
  3. Our goal is to eliminate the need for manually collecting training documents, and hopefully achieve classification accuracy from positive and unlabeled data as high as that from labeled positive and labeled negative data.
    Page 1, “Introduction”
  4. Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data , and add the classification results to the training data.
    Page 1, “Introduction”
  5. First, we randomly select documents from unlabeled data (U) where the number of documents is equal to that of the initial positive training documents (Pl).
    Page 2, “Framework of the System”
  6. For the result of correction (R001, we train SVM classifiers, and classify the remaining unlabeled data (U \ N1).
    Page 2, “Framework of the System”
  7. The rest of the positive and negative documents are used as unlabeled data .
    Page 3, “Experiments”
  8. The number of positive training data in other three methods depends on the value of 6, and the rest of the positive and negative documents were used as unlabeled data .
    Page 3, “Experiments”
  9. Our goal is to achieve classification accuracy from only positive documents and unlabeled data as high as that from labeled positive and negative data.
    Page 4, “Experiments”
  10. The research described in this paper involved text classification using positive and unlabeled data .
    Page 5, “Conclusion”

See all papers in Proc. ACL 2013 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

F-score

Appears in 6 sentences as: F-score (6)
In Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
  1. The results using Reuters documents showed that the method was comparable to the current state-of-the-art biased-SVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.
    Page 1, “Abstract”
  2. We empirically selected values of two parameters, ā€œcā€ (tradeoff between training error and margin) and ā€œjā€, i.e., cost (cost-factor, by which training errors on positive examples) that optimized the F-score obtained by classification of test documents.
    Page 3, “Experiments”
  3. Figure 3 shows micro-averaged F-score against the 6 value.
    Page 3, “Experiments”
  4. F-score
    Page 4, “Experiments”
  5. Figure 3: F-score against the value of 6
    Page 4, “Experiments”
  6. The results using the 1996 Reuters corpora showed that the method was comparable to the current state-of-the-art biased-SVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2013 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

text classification

Appears in 6 sentences as: Text classification (2) text classification (4)
In Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
  1. This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data.
    Page 1, “Abstract”
  2. Text classification using machine learning (ML) techniques with a small number of labeled data has become more important with the rapid increase in volume of online documents.
    Page 1, “Introduction”
  3. Thus, if some training document reduces the overall performance of text classification because of an outlier, we can assume that the document is a SV.
    Page 2, “Framework of the System”
  4. The remaining data consisting 607,259 from 20 Nov 1996 to 19 Aug 1997 is used as a test data for text classification .
    Page 3, “Experiments”
  5. 3.2 Text classification
    Page 3, “Experiments”
  6. The research described in this paper involved text classification using positive and unlabeled data.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2013 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

semi-supervised

Appears in 3 sentences as: semi-supervised (3)
In Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
  1. Quite a lot of learning techniques e. g., semi-supervised learning, self-training, and active learning have been proposed.
    Page 1, “Introduction”
  2. proposed a semi-supervised learning approach called the Graph Mincut algorithm which uses a small number of positive and negative examples and assigns values to unlabeled examples in a way that optimizes consistency in a nearest-neighbor sense (Blum et al., 2001).
    Page 1, “Introduction”
  3. Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data, and add the classification results to the training data.
    Page 1, “Introduction”

See all papers in Proc. ACL 2013 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

Support Vector

Appears in 3 sentences as: Support Vector (1) Support vectors (1) support vectors (1)
In Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
  1. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines (SVM).
    Page 1, “Abstract”
  2. As error candidates, we focus on support vectors (SVs) extracted from the training documents by SVM.
    Page 2, “Framework of the System”
  3. - D \SV ( Support vectors ) Extraction
    Page 2, “Framework of the System”

See all papers in Proc. ACL 2013 that mention Support Vector.

See all papers in Proc. ACL that mention Support Vector.

Back to top.