Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
Zhu, Jun and Zheng, Xun and Zhang, Bo

Article Structure

Abstract

Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions.

Introduction

As widely adopted in supervised latent Dirichlet allocation (sLDA) models (Blei and McAuliffe, 2010; Wang et al., 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available document-level response variables, in addition to the likelihood model for document words.

Logistic Supervised Topic Models

We now present the generalized Bayesian logistic supervised topic models.

A Gibbs Sampling Algorithm

Now, we present a simple and efficient Gibbs sampling algorithm for the generalized Bayesian logistic supervised topic models.

Experiments

We present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG) data set, which contains about 20,000 postings within 20 news groups.

Conclusions and Discussions

We present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation.

Topics

Gibbs sampling

Appears in 21 sentences as: Gibbs sampler (1) Gibbs Sampling (1) Gibbs sampling (21)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. We address these issues by: l) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables.
    Page 1, “Abstract”
  2. Second, to solve the intractable posterior inference problem of the generalized Bayesian logistic supervised topic models, we present a simple Gibbs sampling algorithm by exploring the ideas of data augmentation (Tanner and Wong, 1987; van Dyk and Meng, 2001; Holmes and Held, 2006).
    Page 1, “Introduction”
  3. Then, we develop a simple and efficient Gibbs sampling algorithms with analytic conditional distributions without Metropolis-Hastings accept/reject steps.
    Page 2, “Introduction”
  4. For Bayesian LDA models, we can also explore the conjugacy of the Dirichlet-Multinomial prior-likelihood pairs to collapse out the Dirichlet variables (i.e., topics and mixing proportions) to do collapsed Gibbs sampling , which can have better mixing rates (Griffiths and Steyvers, 2004).
    Page 2, “Introduction”
  5. 3 presents Gibbs sampling algorithms with data augmentation.
    Page 2, “Introduction”
  6. Now, we present a simple and efficient Gibbs sampling algorithm for the generalized Bayesian logistic supervised topic models.
    Page 3, “A Gibbs Sampling Algorithm”
  7. 3.2 Inference with Collapsed Gibbs Sampling
    Page 4, “A Gibbs Sampling Algorithm”
  8. Although we can do Gibbs sampling to infer the complete posterior distribution q(n,A, 8, Z, Q) and thus q(n, ('9, Z, Q) by ignoring A, the mixing rate would be slow due to the large sample space.
    Page 4, “A Gibbs Sampling Algorithm”
  9. Then, the conditional distributions used in collapsed Gibbs sampling are as follows.
    Page 4, “A Gibbs Sampling Algorithm”
  10. Algorithm 1 for collapsed Gibbs sampling 1: Initialization: set A = 1 and randomly draw zdn from a uniform distribution.
    Page 5, “A Gibbs Sampling Algorithm”
  11. For the Gibbs sampler , an estimate of (I) using the samples is $19,; cc 0,: + (is.
    Page 5, “A Gibbs Sampling Algorithm”

See all papers in Proc. ACL 2013 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

topic models

Appears in 21 sentences as: topic model (7) topic models (14)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions.
    Page 1, “Abstract”
  2. First, we present a general framework of Bayesian logistic supervised topic models with a regularization parameter to better balance response variables and words.
    Page 1, “Introduction”
  3. Second, to solve the intractable posterior inference problem of the generalized Bayesian logistic supervised topic models , we present a simple Gibbs sampling algorithm by exploring the ideas of data augmentation (Tanner and Wong, 1987; van Dyk and Meng, 2001; Holmes and Held, 2006).
    Page 1, “Introduction”
  4. More specifically, we extend Polson’s method for Bayesian logistic regression (Polson et al., 2012) to the generalized logistic supervised topic models , which are much more challeng-
    Page 1, “Introduction”
  5. 2 introduces logistic supervised topic models as a general optimization problem.
    Page 2, “Introduction”
  6. We now present the generalized Bayesian logistic supervised topic models .
    Page 2, “Logistic Supervised Topic Models”
  7. A logistic supervised topic model consists of two parts — an LDA model (Blei et al., 2003) for describing the words W = {wd}dD=1, where Wd 2 {wdnfigl denote the words within document d, and a logistic classifier for considering the supervising signal y = {yd}dD=1.
    Page 2, “Logistic Supervised Topic Models”
  8. Logistic classifier: To consider binary supervising information, a logistic supervised topic model (e.g., sLDA) builds a logistic classifier using the topic representations as input features
    Page 2, “Logistic Supervised Topic Models”
  9. Regularized Bayesian Inference: To integrate the above two components for hybrid learning, a logistic supervised topic model solves the joint Bayesian inference problem
    Page 3, “Logistic Supervised Topic Models”
  10. Then, the generalized inference problem (5) of logistic supervised topic models can be written in the “standard” Bayesian inference form (1)
    Page 3, “Logistic Supervised Topic Models”
  11. Comparison with MedLDA: The above formulation of logistic supervised topic models as an instance of regularized Bayesian inference provides a direct comparison with the max-margin
    Page 3, “Logistic Supervised Topic Models”

See all papers in Proc. ACL 2013 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

LDA

Appears in 14 sentences as: LDA (15)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. As widely adopted in supervised latent Dirichlet allocation (sLDA) models (Blei and McAuliffe, 2010; Wang et al., 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available document-level response variables, in addition to the likelihood model for document words.
    Page 1, “Introduction”
  2. Though powerful, one issue that could limit the use of existing logistic supervised LDA models is that they treat the document-level response variable as one additional word via a normalized likelihood model.
    Page 1, “Introduction”
  3. For Bayesian LDA models, we can also explore the conjugacy of the Dirichlet-Multinomial prior-likelihood pairs to collapse out the Dirichlet variables (i.e., topics and mixing proportions) to do collapsed Gibbs sampling, which can have better mixing rates (Griffiths and Steyvers, 2004).
    Page 2, “Introduction”
  4. A logistic supervised topic model consists of two parts — an LDA model (Blei et al., 2003) for describing the words W = {wd}dD=1, where Wd 2 {wdnfigl denote the words within document d, and a logistic classifier for considering the supervising signal y = {yd}dD=1.
    Page 2, “Logistic Supervised Topic Models”
  5. LDA: LDA is a hierarchical Bayesian model that posits each document as an admixture of K topics, where each topic (I);g is a multinomial distribution over a V-word vocabulary.
    Page 2, “Logistic Supervised Topic Models”
  6. For fully-Bayesian LDA , the topics are random samples from a Dirichlet prior, (I);c N Dir(,6).
    Page 2, “Logistic Supervised Topic Models”
  7. LDA infers the posterior distribution p(@, Z, <I>|W) oc p0(@, Z, <I>)p(W|Z, (1)), where p0(@, Z, (1)) = (Hdp(9d|a) Hn P(Zdn|9d)) Hk; p(<1>klfi> is tha joint distribution defined by the model.
    Page 2, “Logistic Supervised Topic Models”
  8. Also, note that the logistic classifier and the LDA likelihood are coupled by sharing the latent topic assignments z.
    Page 3, “Logistic Supervised Topic Models”
  9. LDA (Griffiths and Steyvers, 2004).
    Page 4, “A Gibbs Sampling Algorithm”
  10. where Cfpn indicates that term n is excluded from the corresponding document or topic; 7 = Nid; and Afin = fl 2k, 77/9/0527, is the discriminant function value without word n. We can see that the first term is from the LDA model for observed word counts and the second term is from
    Page 4, “A Gibbs Sampling Algorithm”
  11. We compare the generalized logistic supervised LDA using Gibbs sampling (denoted by gSLDA) with various competitors, including the standard sLDA using variational mean-field methods (denoted by vSLDA) (Wang et al., 2009), the MedLDA model using variational mean-field methods (denoted by vMedLDA) (Zhu et al., 2012), and the MedLDA model using collapsed Gibbs sampling algorithms (denoted by gMedLDA) (Jiang et al., 2012).
    Page 5, “Experiments”

See all papers in Proc. ACL 2013 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

binary classification

Appears in 8 sentences as: Binary classification (1) binary classification (7)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. We consider binary classification with a training set D = {(wd, yd)}dD=1, where the response variable Y takes values from the output space 3/ = {0, 1}.
    Page 2, “Logistic Supervised Topic Models”
  2. loss (Rosasco et al., 2004) in the task of fully observed binary classification .
    Page 3, “Logistic Supervised Topic Models”
  3. 4.1 Binary classification
    Page 5, “Experiments”
  4. 3 shows the performance of gSLDA+ with different burn-in steps for binary classification .
    Page 7, “Experiments”
  5. burn-in steps for binary classification .
    Page 8, “Experiments”
  6. 5 shows the performance of gSLDA in the binary classification task with different c values.
    Page 8, “Experiments”
  7. Figure 5: Performance of gSLDA for binary classification with different c values.
    Page 8, “Experiments”
  8. Figure 6: Accuracy of gSLDA for binary classification with different 04 values in two settings with c = 1 and c = 9.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention binary classification.

See all papers in Proc. ACL that mention binary classification.

Back to top.

latent variables

Appears in 5 sentences as: latent variable (1) latent variables (4)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. ing due to the presence of nontrivial latent variables .
    Page 2, “Introduction”
  2. But the presence of latent variables poses additional challenges in carrying out a formal theoretical analysis of these surrogate losses (Lin, 2001) in the topic model setting.
    Page 3, “Logistic Supervised Topic Models”
  3. Moreover, the latent variables Z make the inference problem harder than that of Bayesian logistic regression models (Chen et al., 1999; Meyer and Laud, 2002; Polson et al., 2012).
    Page 3, “Logistic Supervised Topic Models”
  4. Our algorithm represents a first attempt to extend Polson’s approach (Polson et al., 2012) to deal with highly nontrivial Bayesian latent variable models.
    Page 4, “A Gibbs Sampling Algorithm”
  5. trivial to develop a Gibbs sampling algorithm using the similar data augmentation idea, due to the presence of latent variables and the nonlinearity of the soft-max function.
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention latent variables.

See all papers in Proc. ACL that mention latent variables.

Back to top.

logistic regression

Appears in 4 sentences as: logistic regression (4)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. More specifically, we extend Polson’s method for Bayesian logistic regression (Polson et al., 2012) to the generalized logistic supervised topic models, which are much more challeng-
    Page 1, “Introduction”
  2. Moreover, the latent variables Z make the inference problem harder than that of Bayesian logistic regression models (Chen et al., 1999; Meyer and Laud, 2002; Polson et al., 2012).
    Page 3, “Logistic Supervised Topic Models”
  3. For multi-class classification, one possible extension is to use a multinomial logistic regression model for categorical variables Y by using topic representations Z as input features.
    Page 6, “Experiments”
  4. In fact, this is harder than the multinomial Bayesian logistic regression , which can be done via a coordinate strategy (Polson et al., 2012).
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

optimization problem

Appears in 4 sentences as: optimization problem (3) optimization problems (1)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. Technically, instead of doing standard Bayesian inference via Bayes’ rule, which requires a normalized likelihood model, we propose to do regularized Bayesian inference (Zhu et al., 2011; Zhu et al., 2013b) via solving an optimization problem , where the posterior regularization is defined as an expectation of a logistic loss, a surrogate loss of the expected misclassification error; and a regularization parameter is introduced to balance the surrogate classification loss (i.e., the response log-likelihood) and the word likelihood.
    Page 1, “Introduction”
  2. 2 introduces logistic supervised topic models as a general optimization problem .
    Page 2, “Introduction”
  3. As noticed in (Jiang et al., 2012), the posterior distribution by Bayes’ rule is equivalent to the solution of an information theoretical optimization problem
    Page 2, “Logistic Supervised Topic Models”
  4. supervised topic model (MedLDA) (Jiang et al., 2012), which has the same form of the optimization problems .
    Page 3, “Logistic Supervised Topic Models”

See all papers in Proc. ACL 2013 that mention optimization problem.

See all papers in Proc. ACL that mention optimization problem.

Back to top.

significant improvements

Appears in 4 sentences as: significant improvements (3) significantly improved (1)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. Empirical results demonstrate significant improvements on prediction performance and time efficiency.
    Page 1, “Abstract”
  2. Finally, our empirical results on real data sets demonstrate significant improvements on time efficiency.
    Page 2, “Introduction”
  3. The classification performance is also significantly improved by using appropriate regularization parameters.
    Page 2, “Introduction”
  4. Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models.
    Page 8, “Conclusions and Discussions”

See all papers in Proc. ACL 2013 that mention significant improvements.

See all papers in Proc. ACL that mention significant improvements.

Back to top.

SVM

Appears in 3 sentences as: SVM (3)
In Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
  1. For gLDA, we learn a binary linear SVM on its topic representations using SVMLight (J oachims, 1999).
    Page 5, “Experiments”
  2. The results of DiscLDA (Lacoste-Jullien et al., 2009) and linear SVM on raw bag-of-words features were reported in (Zhu et al., 2012).
    Page 5, “Experiments”
  3. The fact that gLDA+SVM performs better than the standard gSLDA is due to the same reason, since the SVM part of gLDA+SVM can well capture the supervision information to learn a classifier for good prediction, while standard sLDA can’t well-balance the influence of supervision.
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.