Finding Bursty Topics from Microblogs
Diao, Qiming and Jiang, Jing and Zhu, Feida and Lim, Ee-Peng

Article Structure

Abstract

Microblogs such as Twitter reflect the general public’s reactions to major events.

Introduction

With the fast growth of Web 2.0, a vast amount of user-generated content has accumulated on the social Web.

Related Work

To find bursty patterns from data streams, Kleinberg (2002) proposed a state machine to model the arrival times of documents in a stream.

Method

3.1 Preliminaries

Experiments

4.1 Data Set

Conclusions

In this paper, we studied the problem of finding bursty topics from the text streams on microblogs.

Topics

LDA

Appears in 13 sentences as: LDA (14)
In Finding Bursty Topics from Microblogs
  1. Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model.
    Page 1, “Abstract”
  2. To discover topics, we can certainly apply standard topic models such as LDA (Blei et al., 2003), but with standard LDA temporal information is lost during topic discovery.
    Page 1, “Introduction”
  3. We find that compared with bursty topics discovered by standard LDA and by two degenerate variations of our model, bursty topics discovered by our model are more accurate and less redundant within the top-ranked results.
    Page 2, “Introduction”
  4. In standard LDA , a document contains a mixture of topics, represented by a topic distribution, and each word has a hidden topic label.
    Page 3, “Method”
  5. We also consider a standard LDA model in our experiments, where each word is associated with a hidden topic.
    Page 4, “Method”
  6. Just like standard LDA , our topic model itself finds a set of topics represented by gbc but does not directly generate bursty topics.
    Page 5, “Method”
  7. For LDA , these are the numbers of words assigned to topic 0.
    Page 5, “Method”
  8. LDA 0.600 0.800 0.750 N/A TimeLDA 0.
    Page 5, “Method”
  9. LDA 0.600 0.800 0.700 N/A TimeLDA 0.400 0.500 0.500 0.567 UserLDA 0.
    Page 5, “Method”
  10. )f tweets (or words in the case of the LDA model) 1ssigned to the topics and take the top-30 bursty top-cs from each model.
    Page 6, “Experiments”
  11. In the case of the LDA mod-:1, only 23 bursty topics were detected.
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

topic model

Appears in 13 sentences as: Topic Model (1) topic model (6) topic modeling (1) Topic models (1) topic models (4)
In Finding Bursty Topics from Microblogs
  1. To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic.
    Page 1, “Abstract”
  2. To discover topics, we can certainly apply standard topic models such as LDA (Blei et al., 2003), but with standard LDA temporal information is lost during topic discovery.
    Page 1, “Introduction”
  3. (2007) proposed a PLSA-based topic model that exploits this idea to find correlated bursty patterns across multiple text streams.
    Page 2, “Introduction”
  4. In this paper, we propose a topic model designed for finding bursty topics from microblogs.
    Page 2, “Introduction”
  5. Topic models provide a principled and elegant way to discover hidden topics from large document collections.
    Page 2, “Related Work”
  6. Standard topic models do not consider temporal information.
    Page 2, “Related Work”
  7. A number of temporal topic models have been proposed to consider topic changes over time.
    Page 2, “Related Work”
  8. At the topic discovery step, we propose a topic model that considers both users’ topical interests and the global topic trends.
    Page 3, “Method”
  9. 3.2 Our Topic Model
    Page 3, “Method”
  10. Just like standard LDA, our topic model itself finds a set of topics represented by gbc but does not directly generate bursty topics.
    Page 5, “Method”
  11. We assume that after topic modeling , for each discovered topic 0, we can obtain a series of counts (7715,7715, .
    Page 5, “Method”

See all papers in Proc. ACL 2012 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

topic distribution

Appears in 11 sentences as: topic distribution (8) topic distributions (3)
In Finding Bursty Topics from Microblogs
  1. To capture this intuition, one solution is to assume that posts published within the same short time window follow the same topic distribution .
    Page 2, “Introduction”
  2. Our model is based on the following two assumptions: (1) If a post is about a global event, it is likely to follow a global topic distribution that is time-dependent.
    Page 2, “Introduction”
  3. (2) If a post is about a personal topic, it is likely to follow a personal topic distribution that is more or less stable over time.
    Page 2, “Introduction”
  4. (2004) expand topic distributions from document-level to user-level in order to capture users’ specific interests.
    Page 3, “Related Work”
  5. In standard LDA, a document contains a mixture of topics, represented by a topic distribution , and each word has a hidden topic label.
    Page 3, “Method”
  6. As we will see very soon, this treatment also allows us to model topic distributions at time window level and user level.
    Page 3, “Method”
  7. To model this observation, we assume that there is a global topic distribution 6’ for each time point t. Presumably 6’ has a high probability for a topic that is popular in the microblog-sphere at time 75.
    Page 3, “Method”
  8. We therefore introduce a time-independent topic distribution 77“ for each user to capture her long term topical interests.
    Page 3, “Method”
  9. Otherwise, she selects a topic according to her own topic distribution 77“.
    Page 3, “Method”
  10. In this model, we only consider the time-dependent topic distributions that capture the global topical trends.
    Page 4, “Method”
  11. For TimeUserLDA, these are the numbers of posts which are in topic 0 and generated by the global topic distribution 6“, i.e whose hidden variable y, is 1.
    Page 5, “Method”

See all papers in Proc. ACL 2012 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

ground truth

Appears in 4 sentences as: Ground Truth (1) ground truth (3)
In Finding Bursty Topics from Microblogs
  1. 4.2 Ground Truth Generation
    Page 6, “Experiments”
  2. For ground truth , we consider a bursty topic to be cor-'ect if both human judges have scored it 1.
    Page 6, “Experiments”
  3. topics from the ground truth bursty topics.
    Page 7, “Experiments”
  4. The bursts around day 20, day 44 and day 85 are all real events based on our ground truth .
    Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention ground truth.

See all papers in Proc. ACL that mention ground truth.

Back to top.

generation process

Appears in 3 sentences as: generation process (3)
In Finding Bursty Topics from Microblogs
  1. We assume the following generation process for all the posts in the stream.
    Page 3, “Method”
  2. Figure 2: The generation process for all posts.
    Page 4, “Method”
  3. Formally, the generation process is summarized in Figure 2.
    Page 4, “Method”

See all papers in Proc. ACL 2012 that mention generation process.

See all papers in Proc. ACL that mention generation process.

Back to top.

Gibbs sampling

Appears in 3 sentences as: Gibbs sampling (3)
In Finding Bursty Topics from Microblogs
  1. We use collapsed Gibbs sampling to obtain samples of the hidden variable assignment and to estimate the model parameters from these samples.
    Page 4, “Method”
  2. Due to space limit, we only show the derived Gibbs sampling formulas as follows.
    Page 4, “Method”
  3. Each model was run for 500 iterations of Gibbs sampling .
    Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

human judges

Appears in 3 sentences as: human judges (2) human judge’s (1)
In Finding Bursty Topics from Microblogs
  1. We merged hese topics and asked two human judges to judge heir quality by assigning a score of either 0 or l. The judges are graduate students living in Singapore 1nd not involved in this project.
    Page 6, “Experiments”
  2. based on the human judge’s understanding.
    Page 6, “Experiments”
  3. For ground truth, we consider a bursty topic to be cor-'ect if both human judges have scored it 1.
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention human judges.

See all papers in Proc. ACL that mention human judges.

Back to top.

news articles

Appears in 3 sentences as: news articles (3)
In Finding Bursty Topics from Microblogs
  1. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy.
    Page 1, “Abstract”
  2. More importantly, their model was applied to news articles and scientific publications, where most documents follow the global topical trends.
    Page 2, “Introduction”
  3. Unlike news articles from traditional media, which are mostly about current affairs, an important property of microblog posts is that many posts are about users’ personal encounters and interests rather than global events.
    Page 3, “Method”

See all papers in Proc. ACL 2012 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.