Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan

Article Structure

Abstract

Spectral methods offer scalable alternatives to Markov chain Monte Carlo and expectation maximization.

Introduction

Topic models are of practical and theoretical interest.

Anchor Words: Scalable Topic Models

In this section, we briefly review the anchor method and place it in the context of topic model inference.

Adding Regularization

In this section, we add regularizers to the anchor objective (Equation 3).

Regularization Improves Topic Models

In this section, we measure the performance of our proposed regularized anchor word algorithms.

Discussion

Having shown that regularization can improve the anchor topic modeling algorithm, in this section we discuss why these regularizations can improve the model and the implications for practitioners.

Conclusion

A topic model is a popular tool for quickly getting the gist of large corpora.

Topics

topic models

Appears in 19 sentences as: topic model (3) topic modeling (5) Topic models (1) topic models (11)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. We examine Arora et al.’s anchor words algorithm for topic modeling and develop new, regularized algorithms that not only mathematically resemble Gaussian and Dirichlet priors but also improve the interpretability of topic models .
    Page 1, “Abstract”
  2. Topic models are of practical and theoretical interest.
    Page 1, “Introduction”
  3. Modern topic models are formulated as a latent variable model.
    Page 1, “Introduction”
  4. Unlike a HMM, topic models assume that each document is an admixture of these hidden components called topics.
    Page 1, “Introduction”
  5. (2012b)’s approach for inference in topic models assume that each topic has a unique “anchor” word (thus, we call this approach anchor).
    Page 1, “Introduction”
  6. In this section, we briefly review the anchor method and place it in the context of topic model inference.
    Page 2, “Anchor Words: Scalable Topic Models”
  7. Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data.
    Page 2, “Anchor Words: Scalable Topic Models”
  8. Like other topic modeling algorithms, the output of the anchor method is the topic word distributions A with size V >I< K, where K is the total number of topics desired, a parameter of the algorithm.
    Page 2, “Anchor Words: Scalable Topic Models”
  9. The coefficient matrix is not the usual output of a topic modeling algorithm.
    Page 2, “Anchor Words: Scalable Topic Models”
  10. However, it does not support rich priors for topic models , while MCMC (Griffiths and Steyvers, 2004) and variational EM (Blei et al., 2003) methods can.
    Page 3, “Anchor Words: Scalable Topic Models”
  11. While the distribution over topics is typically Dirichlet, Dirichlet distributions have been replaced by logistic normals in topic modeling applications (Blei and Lafferty, 2005) and for probabilistic grammars of language (Cohen and Smith, 2009).
    Page 3, “Adding Regularization”

See all papers in Proc. ACL 2014 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

latent variable

Appears in 6 sentences as: latent variable (4) latent variables (3)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. Theoretically, their latent variable formulation has served as a foundation for more robust models of other linguistic phenomena (Brody and Lapata, 2009).
    Page 1, “Introduction”
  2. Modern topic models are formulated as a latent variable model.
    Page 1, “Introduction”
  3. Typical solutions use MCMC (Griffiths and Steyvers, 2004) or variational EM (Blei et al., 2003), which can be viewed as local optimization: searching for the latent variables that maximize the data likelihood.
    Page 1, “Introduction”
  4. proaches provide solutions to hidden Markov models (Anandkumar et a1., 2012), mixture models (Kannan et a1., 2005), and latent variable grammars (Cohen et al., 2013).
    Page 1, “Introduction”
  5. The key insight is not to directly optimize observation likelihood but to instead discover latent variables that can reconstruct statistics of the assumed generative model.
    Page 1, “Introduction”
  6. These regularizations could improve spectral algorithms for latent variables models, improving the performance for other NLP tasks such as latent variable PCFGs (Cohen et al., 2013) and HMMs (Anandkumar et al., 2012), combining the flexibility and robustness offered by priors with the speed and accuracy of new, scalable algorithms.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2014 that mention latent variable.

See all papers in Proc. ACL that mention latent variable.

Back to top.

development set

Appears in 5 sentences as: Development Set (1) development set (4)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. We split each dataset into a training fold (70%), development fold (15%), and a test fold (15%): the training data are used to fit models; the development set are used to select parameters (anchor threshold M, document prior 04, regularization weight A); and final results are reported on the test fold.
    Page 4, “Regularization Improves Topic Models”
  2. We select 04 using grid search on the development set .
    Page 4, “Regularization Improves Topic Models”
  3. 4.1 Grid Search for Parameters on Development Set
    Page 5, “Regularization Improves Topic Models”
  4. Regularization Weight Once we select a cutoff M for each combination of dataset, number of topics K and a evaluation measure, we select a regularization weight A on the development set .
    Page 5, “Regularization Improves Topic Models”
  5. With document frequency M and regularization weight A selected from the development set , we
    Page 5, “Regularization Improves Topic Models”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

co-occurrence

Appears in 4 sentences as: Co-occurrence (1) co-occurrence (3)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. This approach is fast and effective; because it only uses word co-occurrence information, it can scale to much larger datasets than MCMC or EM alternatives.
    Page 1, “Introduction”
  2. Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data.
    Page 2, “Anchor Words: Scalable Topic Models”
  3. The difference between anchor and conventional inference is that while conventional methods take a collection of documents as input, anchor takes word co-occurrence statistics.
    Page 2, “Anchor Words: Scalable Topic Models”
  4. (3) The anchor method is fast, as it only depends on the size of the vocabulary once the co-occurrence statistics Q are obtained.
    Page 3, “Anchor Words: Scalable Topic Models”

See all papers in Proc. ACL 2014 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

objective function

Appears in 4 sentences as: objective function (5)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. Once we have established the anchor objective function, in the next section we regularize the objective function .
    Page 2, “Anchor Words: Scalable Topic Models”
  2. In this section, we briefly review regularizers and then add two regularizers, inspired by Gaussian (L2, Section 3.1) and Dirichlet priors (Beta, Section 3.2), to the anchor objective function (Equation 3).
    Page 3, “Adding Regularization”
  3. Instead of optimizing a function just of the data cc and parameters 6, f (cc, 6), one optimizes an objective function that includes a regularizer that is only a function of parameters: f (w, 6) + 716).
    Page 3, “Adding Regularization”
  4. This requires including the topic matrix as part of the objective function .
    Page 4, “Adding Regularization”

See all papers in Proc. ACL 2014 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

probabilistic models

Appears in 3 sentences as: probabilistic model (1) probabilistic models (2)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. However, these new methods lack the rich priors associated with probabilistic models .
    Page 1, “Abstract”
  2. For example, if we are seeking the MLE of a probabilistic model parameterized by 6, p(:c|6), adding a regularization term 7(6) = 2le 6?
    Page 3, “Adding Regularization”
  3. This is the typical evaluation for probabilistic models .
    Page 4, “Regularization Improves Topic Models”

See all papers in Proc. ACL 2014 that mention probabilistic models.

See all papers in Proc. ACL that mention probabilistic models.

Back to top.

topic distribution

Appears in 3 sentences as: topic distribution (2) topic distributions (1)
In Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
  1. The kth column of A will be the topic distribution over all words for topic 11:, and A111,], is the probability of observing type w given topic 11:.
    Page 2, “Anchor Words: Scalable Topic Models”
  2. We use Bayes rule to recover the topic distribution p(w = = k) E
    Page 2, “Anchor Words: Scalable Topic Models”
  3. Held-out likelihood cannot be computed with existing anchor algorithms, so we use the topic distributions learned from anchor as input to a reference variational inference implementation (Blei et al., 2003) to compute HL.
    Page 4, “Regularization Improves Topic Models”

See all papers in Proc. ACL 2014 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.