Scaling Semi-supervised Naive Bayes with Feature Marginals
Lucas, Michael and Downey, Doug

Article Structure

Abstract

Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data.

Introduction

Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006).

Problem Definition

We consider a semi-supervised classification task, in which the goal is to produce a mapping from an instance space 26 consisting of T-tuples of nonnegative integer-valued features w 2 (ml, .

Topics

unlabeled data

Appears in 25 sentences as: unlabeled data (28)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data .
    Page 1, “Abstract”
  2. However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data , meaning the techniques are not applicable to large corpora being produced today.
    Page 1, “Abstract”
  3. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets.
    Page 1, “Abstract”
  4. Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data , combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006).
    Page 1, “Introduction”
  5. Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)).
    Page 1, “Introduction”
  6. Instead of utilizing unlabeled examples directly for each given target concept, our approach is to pre-compute a small set of statistics over the unlabeled data in advance.
    Page 1, “Introduction”
  7. Specifically, we introduce a method that extends Multinomial Naive Bayes (MNB) to leverage marginal probability statistics P of each word w, computed over the unlabeled data .
    Page 1, “Introduction”
  8. In experiments with large unlabeled data sets and sparse labeled data, we find that MNB-FM is both faster and more accurate on average than standard SSL methods from previous
    Page 1, “Introduction”
  9. MNB-FM attempts to improve MNB’s estimates of 6:; and 6,; , using statistics computed over the unlabeled data .
    Page 2, “Problem Definition”
  10. Equation 2 can be estimated in advance, Without knowledge of the target class, simply by counting the number of tokens of each word in the unlabeled data .
    Page 3, “Problem Definition”
  11. The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates.
    Page 4, “Problem Definition”

See all papers in Proc. ACL 2013 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

semi-supervised

Appears in 15 sentences as: Semi-Supervised (1) Semi-supervised (4) semi-supervised (10)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data.
    Page 1, “Abstract”
  2. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets.
    Page 1, “Abstract”
  3. Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al., 2006).
    Page 1, “Introduction”
  4. Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. g., Expectation-Maximization (Nigam et al., 2000) or Label Propagation (Zhu and Ghahramani, 2002)).
    Page 1, “Introduction”
  5. We consider a semi-supervised classification task, in which the goal is to produce a mapping from an instance space 26 consisting of T-tuples of nonnegative integer-valued features w 2 (ml, .
    Page 2, “Problem Definition”
  6. Our semi-supervised technique utilizes statistics computed over the labeled corpus, denoted as follows.
    Page 2, “Problem Definition”
  7. In addition to Multinomial Naive Bayes (discussed in Section 3), we evaluate against a variety of supervised and semi-supervised techniques from previous work, which provide a representation of the state of the art.
    Page 5, “Problem Definition”
  8. We implemented a semi-supervised version of Naive Bayes with Expectation Maximization, based on (Nigam et al., 2000).
    Page 5, “Problem Definition”
  9. We also re-implemented a version of the recent Semi-supervised Frequency Estimate approach (Su et al., 2011).
    Page 5, “Problem Definition”
  10. a large unlabeled data set as constraints to improve a semi-supervised classifier.
    Page 8, “Problem Definition”
  11. Further, the MNB-FM approach offers scalability advantages over most existing semi-supervised approaches.
    Page 8, “Problem Definition”

See all papers in Proc. ACL 2013 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

labeled data

Appears in 10 sentences as: labeled data (10)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available.
    Page 1, “Abstract”
  2. Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data , to learn a target function (Zhu, 2006; Chapelle et al., 2006).
    Page 1, “Introduction”
  3. Then, for a given target class and labeled data set, we utilize the statistics to improve a classifier.
    Page 1, “Introduction”
  4. The marginal statistics are used as a constraint to improve the class-conditional probability estimates P (w | +) and P (w | —) for the positive and negative classes, which are often noisy when estimated over sparse labeled data sets.
    Page 1, “Introduction”
  5. In experiments with large unlabeled data sets and sparse labeled data , we find that MNB-FM is both faster and more accurate on average than standard SSL methods from previous
    Page 1, “Introduction”
  6. In particular, SFE uses the equality P(+|w) = P(+, and estimates the rhs using P computed over all the unlabeled data, rather than using only labeled data as in standard MNB.
    Page 5, “Problem Definition”
  7. Further, it can be shown that as P(w) of a word 21) in the unlabeled data becomes larger than that in the labeled data , SFE’s estimate of the ratio P(w|+) /P(w|—) approaches one.
    Page 5, “Problem Definition”
  8. Depending on the labeled data , such an estimate can be arbitrarily inaccurate.
    Page 5, “Problem Definition”
  9. MNB-FM is the best performing method over all data sets when the labeled data is limited to 10 and 100 documents, except for training sets of size 10 in Aptemod, where MNB has a slight edge.
    Page 6, “Problem Definition”
  10. Instead, the complexity of MNB-FM is proportional only to the number of unique words in the labeled data set.
    Page 8, “Problem Definition”

See all papers in Proc. ACL 2013 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

text classification

Appears in 7 sentences as: text classification (7)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. SSL techniques are often effective in text classification , where labeled data is scarce but large unlabeled corpora are readily available.
    Page 1, “Abstract”
  2. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets.
    Page 1, “Abstract”
  3. This is problematic for text classification over large unlabeled corpora like the Web: new target concepts (new tasks and new topics of interest) arise frequently, and performing even a single pass over a large corpus for each new target concept is intractable.
    Page 1, “Introduction”
  4. In this paper, we present a new SSL text classification approach that scales to large corpora.
    Page 1, “Introduction”
  5. In the text classification setting , each feature value wd represents count of observations of word 21) in document d. MNB makes the simplifying assumption that word occurrences are conditionally independent of each other given the class (+ or —) of the example.
    Page 2, “Problem Definition”
  6. We evaluate on two text classification tasks: topic classification, and sentiment detection.
    Page 4, “Problem Definition”
  7. Our experiments demonstrate that MNB-FM outperforms previous approaches across multiple text classification techniques including topic classification and sentiment analysis.
    Page 8, “Problem Definition”

See all papers in Proc. ACL 2013 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

Logistic Regression

Appears in 5 sentences as: Logistic Regression (4) logistic regression (1)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. 4.2.2 Logistic Regression
    Page 5, “Problem Definition”
  2. We implemented Logistic Regression using L2-Normalization, finding this to outperform Ll-Normalized and non-normalized versions.
    Page 5, “Problem Definition”
  3. The strength of the normalization in the logistic regression required cross-validation, which we limited to 20 values logarithmically spaced between 10—4 and 104.
    Page 5, “Problem Definition”
  4. Logistic Regression performed particularly well on the R-Precision Metric, as can be seen in Tables
    Page 6, “Problem Definition”
  5. MNB-FM performs essentially equivalently well, on average, to the best competing method ( Logistic Regression ) on the large RCVl data set.
    Page 7, “Problem Definition”

See all papers in Proc. ACL 2013 that mention Logistic Regression.

See all papers in Proc. ACL that mention Logistic Regression.

Back to top.

classification tasks

Appears in 3 sentences as: classification task (1) classification tasks (2)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. We consider a semi-supervised classification task , in which the goal is to produce a mapping from an instance space 26 consisting of T-tuples of nonnegative integer-valued features w 2 (ml, .
    Page 2, “Problem Definition”
  2. We evaluate on two text classification tasks : topic classification, and sentiment detection.
    Page 4, “Problem Definition”
  3. However, LR underperforms in classification tasks (in terms of F1, Tables 4-6).
    Page 7, “Problem Definition”

See all papers in Proc. ACL 2013 that mention classification tasks.

See all papers in Proc. ACL that mention classification tasks.

Back to top.

conditional probability

Appears in 3 sentences as: conditional probability (3)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates.
    Page 4, “Problem Definition”
  2. MNB-FM improves the conditional probability estimates in MNB and, surprisingly, we found that it can often improve these estimates for words that do not even occur in the training set.
    Page 6, “Problem Definition”
  3. For each word in the vocabulary, we compared each classifier’s conditional probability ratios, i.e.
    Page 6, “Problem Definition”

See all papers in Proc. ACL 2013 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

sentiment analysis

Appears in 3 sentences as: sentiment analysis (3)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. In experiments with text topic classification and sentiment analysis , we show that our method is both more scalable and more accurate than SSL techniques from previous work.
    Page 1, “Abstract”
  2. Our experiments demonstrate that MNB-FM outperforms previous approaches across multiple text classification techniques including topic classification and sentiment analysis .
    Page 8, “Problem Definition”
  3. In experiments across topic classification and sentiment analysis , MNB-FM was found to be more accurate and more scalable than several supervised and semi-supervised baselines from previous work.
    Page 8, “Problem Definition”

See all papers in Proc. ACL 2013 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

Sentiment Classification

Appears in 3 sentences as: Sentiment Classification (3)
In Scaling Semi-supervised Naive Bayes with Feature Marginals
  1. 4.1.3 Sentiment Classification Data
    Page 4, “Problem Definition”
  2. In the domain of Sentiment Classification , we tested on the Amazon dataset from (Blitzer et al., 2007).
    Page 4, “Problem Definition”
  3. In the Amazon Sentiment Classification data set, the task is to determine whether a review is positive or negative based solely on the reviewer’s submitted text.
    Page 4, “Problem Definition”

See all papers in Proc. ACL 2013 that mention Sentiment Classification.

See all papers in Proc. ACL that mention Sentiment Classification.

Back to top.