Interactive Topic Modeling
Hu, Yuening and Boyd-Graber, Jordan and Satinoff, Brianna

Article Structure

Abstract

Topic models have been used extensively as a tool for corpus exploration, and a cottage industry has developed to tweak topic models to better encode human intuitions or to better model data.

Introduction

Probabilistic topic models, as exemplified by probabilistic latent semantic indexing (Hofmann, 1999) and latent Dirichlet allocation (LDA) (Blei et al., 2003) are unsupervised statistical techniques to discover the thematic topics that permeate a large corpus of text documents.

Putting Knowledge in Topic Models

At a high level, topic models such as LDA take as input a number of topics K and a corpus.

Constraints Shape Topics

As discussed above, LDA views topics as distributions over words, and each document expresses an admixture of these topics.

Interactively adding constraints

For a static model, inference in ITM is the same as in previous models (Andrzejewski et al., 2009).

Motivating Example

To examine the viability of ITM, we begin with a qualitative demonstration that shows the potential usefulness of ITM.

Simulation Experiment

Next, we consider a process for evaluating our ITM using automatically derived constraints.

Getting Humans in the Loop

To move beyond using simulated users adding the same words regardless of what topics were discovered by the model, we needed to expose the model to human users.

Discussion

In this work, we introduced a means for end-users to refine and improve the topics discovered by topic models.

Topics

topic models

Appears in 24 sentences as: topic model (2) Topic modeling (1) topic modeling (6) Topic Models (1) Topic models (2) topic models (15)
In Interactive Topic Modeling
  1. Topic models have been used extensively as a tool for corpus exploration, and a cottage industry has developed to tweak topic models to better encode human intuitions or to better model data.
    Page 1, “Abstract”
  2. However, creating such extensions requires expertise in machine learning unavailable to potential end-users of topic modeling software.
    Page 1, “Abstract”
  3. Probabilistic topic models , as exemplified by probabilistic latent semantic indexing (Hofmann, 1999) and latent Dirichlet allocation (LDA) (Blei et al., 2003) are unsupervised statistical techniques to discover the thematic topics that permeate a large corpus of text documents.
    Page 1, “Introduction”
  4. Topic models have had considerable application beyond natural language processing in computer vision (Rob et al., 2005), biology (Shringarpure and Xing, 2008), and psychology (Landauer et al., 2006) in addition to their canonical application to text.
    Page 1, “Introduction”
  5. For text, one of the few real-world applications of topic models is corpus exploration.
    Page 1, “Introduction”
  6. Unannotated, noisy, and ever-growing corpora are the norm rather than the exception, and topic models offer a way to quickly get the gist a large corpus.1
    Page 1, “Introduction”
  7. Contrary to the impression given by the tables shown in topic modeling papers, topics discovered by topic modeling don’t always make sense to ostensible end users.
    Page 1, “Introduction”
  8. Part of the problem is that the objective function of topic models doesn’t always correlate with human judgements (Chang et al., 2009).
    Page 1, “Introduction”
  9. Another issue is that topic models — with their bag-of-words vision of the world — simply lack the necessary information to create the topics as end-users expect.
    Page 1, “Introduction”
  10. There has been a thriving cottage industry adding more and more information to topic models to correct these shortcomings; either by modeling perspective (Paul and Girju, 2010; Lin et al., 2006), syntax (Wallach, 2006; Gruber et al., 2007), or authorship (Rosen—Zvi etal., 2004; Dietz etal., 2007).
    Page 1, “Introduction”
  11. Similarly, there has been an effort to inject human knowledge into topic models (Boyd-Graber et al., 2007; Andrzejewski et al., 2009; Petterson et al., 2010).
    Page 1, “Introduction”

See all papers in Proc. ACL 2011 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

LDA

Appears in 21 sentences as: LDA (22)
In Interactive Topic Modeling
  1. In this work, we develop a framework for allowing users to iteratively refine the topics discovered by models such as latent Dirichlet allocation ( LDA ) by adding constraints that enforce that sets of words must appear together in the same topic.
    Page 1, “Abstract”
  2. Probabilistic topic models, as exemplified by probabilistic latent semantic indexing (Hofmann, 1999) and latent Dirichlet allocation ( LDA ) (Blei et al., 2003) are unsupervised statistical techniques to discover the thematic topics that permeate a large corpus of text documents.
    Page 1, “Introduction”
  3. At a high level, topic models such as LDA take as input a number of topics K and a corpus.
    Page 2, “Putting Knowledge in Topic Models”
  4. In LDA both of these outputs are multinomial distributions; typically they are presented to users in summary form by listing the elements with highest probability.
    Page 2, “Putting Knowledge in Topic Models”
  5. As discussed above, LDA views topics as distributions over words, and each document expresses an admixture of these topics.
    Page 2, “Constraints Shape Topics”
  6. For “vanilla” LDA (no constraints), these are symmetric Dirichlet distributions.
    Page 2, “Constraints Shape Topics”
  7. Because LDA assumes a document’s tokens are interchangeable, it treats the document as a bag-of—words, ignoring potential relations between words.
    Page 2, “Constraints Shape Topics”
  8. This problem with vanilla LDA can be solved by encoding constraints, which will “guide” different words into the same topic.
    Page 2, “Constraints Shape Topics”
  9. Constraints can be added to vanilla LDA by replacing the multinomial distribution over words for each topic with a collection of
    Page 2, “Constraints Shape Topics”
  10. In LDA , a document’s token is produced in the generative process by choosing a topic 2 and sampling a word from the multinomial distribution gbz of topic 2.
    Page 3, “Constraints Shape Topics”
  11. Then the generative process for constrained LDA is:
    Page 3, “Constraints Shape Topics”

See all papers in Proc. ACL 2011 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

Gibbs sampling

Appears in 9 sentences as: Gibbs sampler (2) Gibbs Sampling (1) Gibbs sampling (6)
In Interactive Topic Modeling
  1. 3.1 Gibbs Sampling for Topic Models
    Page 3, “Constraints Shape Topics”
  2. In topic modeling, collapsed Gibbs sampling (Griffiths and Steyvers, 2004) is a standard procedure for obtaining a Markov chain over the latent variables in the model.
    Page 3, “Constraints Shape Topics”
  3. Given M documents the state of a Gibbs sampler for LDA consists of topic assignments for each token in the corpus and is represented as Z : {21,1...21,N1,22,1,...2M,NM}.
    Page 3, “Constraints Shape Topics”
  4. In the implementation of a Gibbs sampler , unassignment is done by setting a token’s topic assignment to an invalid topic (e. g. -l, as we use here) and decrementing any counts associated with that word.
    Page 4, “Interactively adding constraints”
  5. Next, we perform one of the strategies for state ablation, add additional iterations of Gibbs sampling , use the newly obtained topic distribution of each document as the feature vector, and perform classification on the test / train split.
    Page 6, “Simulation Experiment”
  6. Each is averaged over five different chains using 10 additional iterations of Gibbs sampling per round (other numbers of iterations are discussed in Section 6.4).
    Page 7, “Simulation Experiment”
  7. Figure 4 shows the effect of using different numbers of Gibbs sampling iterations after changing a constraint.
    Page 7, “Simulation Experiment”
  8. For each of the ablation strategies, we run {10, 20, 30, 50, 100} additional Gibbs sampling iterations.
    Page 7, “Simulation Experiment”
  9. As presented here, the technique for incorporating constraints is closely tied to inference with Gibbs sampling .
    Page 9, “Discussion”

See all papers in Proc. ACL 2011 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

error rate

Appears in 8 sentences as: Error rate (1) error rate (8)
In Interactive Topic Modeling
  1. The lower the classification error rate , the better the model has captured the structure of the corpus.4
    Page 6, “Simulation Experiment”
  2. While Null sees no constraints, it serves as an upper baseline for the error rate (lower error being better) but shows the effect of additional inference.
    Page 7, “Simulation Experiment”
  3. All Full is a lower baseline for the error rate since it both sees the constraints at the beginning and also runs for the maximum number of total iterations.
    Page 7, “Simulation Experiment”
  4. Figure 3: Error rate (y-axis, lower is better) using different ablation strategies as additional constraints are added (x-axis).
    Page 7, “Simulation Experiment”
  5. The results of None, Term, Doc are more stable (as denoted by the error bars), and the error rate is reduced gradually as more constraint words are added.
    Page 7, “Simulation Experiment”
  6. The error rate of each interactive ablation strategy is (as expected) between the lower and upper baselines.
    Page 7, “Simulation Experiment”
  7. For all numbers of additional iterations, while the Null serves as the upper baseline on the error rate in all cases, the Doc ablation clearly outperforms the other ablation schemes, consistently yielding a lower error rate .
    Page 7, “Simulation Experiment”
  8. Figure 6: The relative error rate (using round 0 as a baseline) of the best Mechanical Turk user session for each of the four numbers of topics.
    Page 9, “Getting Humans in the Loop”

See all papers in Proc. ACL 2011 that mention error rate.

See all papers in Proc. ACL that mention error rate.

Back to top.

hyperparameters

Appears in 6 sentences as: hyperparameter (2) hyperparameters (6)
In Interactive Topic Modeling
  1. In this model, a, 6, and 77 are Dirichlet hyperparameters set by the user; their role is explained below.
    Page 3, “Constraints Shape Topics”
  2. where TM is the number of times topic k is used in document d, Phwd’n is the number of times the type wdm, is assigned to topic k, and 04, 6 are the hyperparameters of the two Dirichlet distributions, and B is the number of top-level branches (this is the vocabulary size for vanilla LDA).
    Page 3, “Constraints Shape Topics”
  3. In order to make the constraints effective, we set the constraint word-distribution hyperparameter 77 to be much larger than the hyperparameter for the distribution over constraints and vocabulary 6.
    Page 4, “Constraints Shape Topics”
  4. Normally, estimating hyperparameters is important for topic modeling (Wallach et al., 2009).
    Page 4, “Constraints Shape Topics”
  5. However, in ITM, sampling hyperparameters often (but not always) undoes the constraints (by making 77 comparable to 6), so we keep the hyperparameters fixed.
    Page 4, “Constraints Shape Topics”
  6. The hyperparameters for all experiments are 04 = 0.1, 6 = 0.01, and 77 = 100.
    Page 6, “Simulation Experiment”

See all papers in Proc. ACL 2011 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

Mechanical Turk

Appears in 5 sentences as: Mechanical Turk (5)
In Interactive Topic Modeling
  1. We solicited approximately 200 judgments from Mechanical Turk , a popular crowdsourcing platform that has been used to gather linguistic annotations (Snow et a1., 2008), measure topic quality (Chang et al., 2009), and supplement traditional inference techniques for topic models (Chang, 2010).
    Page 8, “Getting Humans in the Loop”
  2. Figure 5 shows the interface used in the Mechanical Turk tests.
    Page 8, “Getting Humans in the Loop”
  3. Figure 5: Interface for Mechanical Turk experiments.
    Page 8, “Getting Humans in the Loop”
  4. Figure 6: The relative error rate (using round 0 as a baseline) of the best Mechanical Turk user session for each of the four numbers of topics.
    Page 9, “Getting Humans in the Loop”
  5. We hope to engage these algorithms with more sophisticated users than those on Mechanical Turk to measure how these models can help them better explore and understand large, uncurated data sets.
    Page 9, “Discussion”

See all papers in Proc. ACL 2011 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.

generative process

Appears in 4 sentences as: generative process (4)
In Interactive Topic Modeling
  1. introduce ambiguity over the path associated with an observed token in the generative process .
    Page 2, “Putting Knowledge in Topic Models”
  2. In LDA, a document’s token is produced in the generative process by choosing a topic 2 and sampling a word from the multinomial distribution gbz of topic 2.
    Page 3, “Constraints Shape Topics”
  3. If that is an unconstrained word, the word is emitted and the generative process for that token is done.
    Page 3, “Constraints Shape Topics”
  4. Then the generative process for constrained LDA is:
    Page 3, “Constraints Shape Topics”

See all papers in Proc. ACL 2011 that mention generative process.

See all papers in Proc. ACL that mention generative process.

Back to top.

topic distribution

Appears in 4 sentences as: topic distribution (4)
In Interactive Topic Modeling
  1. Specifically, for a corpus with M classes, we use the per-document topic distribution as a feature vector in a supervised classi-
    Page 5, “Simulation Experiment”
  2. Next, we perform one of the strategies for state ablation, add additional iterations of Gibbs sampling, use the newly obtained topic distribution of each document as the feature vector, and perform classification on the test / train split.
    Page 6, “Simulation Experiment”
  3. Doc ablation gives more freedom for the constraints to overcome the inertia of the old topic distribution and move towards a new one influenced by the constraints.
    Page 7, “Simulation Experiment”
  4. In general, the more words in the constraint, the more likely it was to noticeably affect the topic distribution .
    Page 9, “Getting Humans in the Loop”

See all papers in Proc. ACL 2011 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

latent variables

Appears in 3 sentences as: latent variable (1) latent variables (2)
In Interactive Topic Modeling
  1. In topic modeling, collapsed Gibbs sampling (Griffiths and Steyvers, 2004) is a standard procedure for obtaining a Markov chain over the latent variables in the model.
    Page 3, “Constraints Shape Topics”
  2. Typically, these only change based on assignments of latent variables in the sampler; in Section 4 we describe how changes in the model’s structure (in addition to the latent state) can be reflected in these count statistics.
    Page 3, “Constraints Shape Topics”
  3. In the more general case, when words lack a unique path in the constraint tree, an additional latent variable specifies which possible paths in the constraint tree produced the word; this would have to be sampled.
    Page 5, “Interactively adding constraints”

See all papers in Proc. ACL 2011 that mention latent variables.

See all papers in Proc. ACL that mention latent variables.

Back to top.