Topical Keyphrase Extraction from Twitter
Zhao, Xin and Jiang, Jing and He, Jing and Song, Yang and Achanauparp, Palakorn and Lim, Ee-Peng and Li, Xiaoming

Article Structure

Abstract

Summarizing and analyzing Twitter content is an important and challenging task.

Introduction

Twitter, a new microblogging website, has attracted hundreds of millions of users who publish short messages (a.k.a.

Related Work

Our work is related to unsupervised keyphrase extraction.

Method

3.1 Preliminaries

Experiments

4.1 Data Set and Preprocessing

Conclusion

In this paper, we studied the novel problem of topical keyphrase extraction for summarizing and analyzing Twitter content.

Topics

PageRank

Appears in 15 sentences as: PageRank (15)
In Topical Keyphrase Extraction from Twitter
  1. We propose a context-sensitive topical PageRank method for keyword ranking and a probabilistic scoring function that considers both relevance and interestingness of keyphrases for keyphrase ranking.
    Page 1, “Abstract”
  2. For keyword ranking, we modify the Topical PageRank method proposed by Liu et al.
    Page 2, “Introduction”
  3. Mihalcea and Tarau (2004) proposed to use TextRank, a modified PageRank algorithm to extract keyphrases.
    Page 2, “Related Work”
  4. Next for each topic, we run a topical PageRank algorithm to rank keywords and then generate candidate keyphrases using the top ranked keywords (Section 3.3).
    Page 2, “Method”
  5. 3.3 Topical PageRank for Keyword Ranking
    Page 3, “Method”
  6. Topical PageRank was introduced by Liu et al.
    Page 3, “Method”
  7. It runs topic-biased PageRank for each topic separately and boosts those words with high relevance to the corresponding topic.
    Page 3, “Method”
  8. Formally, the topic-specific PageRank scores can be defined as follows:
    Page 3, “Method”
  9. where Rt(w) is the topic-specific PageRank score of word 21) in topic 75, e(wj, is the weight for the edge (wj —> 10,-), 0(wj) 2 2w, 6(wj,w’) and A is a damping factor ranging from 0 to l. The topic-
    Page 3, “Method”
  10. We denote this original version of the Topical PageRank as TPR.
    Page 3, “Method”
  11. So in this paper, we propose to use a topic context sensitive PageRank method.
    Page 3, “Method”

See all papers in Proc. ACL 2011 that mention PageRank.

See all papers in Proc. ACL that mention PageRank.

Back to top.

scoring function

Appears in 6 sentences as: scoring function (5) scoring function: (1)
In Topical Keyphrase Extraction from Twitter
  1. We propose a context-sensitive topical PageRank method for keyword ranking and a probabilistic scoring function that considers both relevance and interestingness of keyphrases for keyphrase ranking.
    Page 1, “Abstract”
  2. While a standard method is to simply aggregate the scores of keywords inside a candidate keyphrase as the score for the keyphrase, here we propose a different probabilistic scoring function .
    Page 3, “Method”
  3. tion (8) into Equation (3) and obtain the following scoring function for ranking:
    Page 5, “Method”
  4. Our preliminary experiments with Equation (9) show that this scoring function usually ranks longer keyphrases higher than shorter ones.
    Page 5, “Method”
  5. So here we heuristically incorporate our length preference by removing |k|n from Equation (9), resulting in the following final scoring function:
    Page 5, “Method”
  6. We have proposed a context-sensitive topical PageRank method (cTPR) for the first step of keyword ranking, and a probabilistic scoring function for the third step of keyphrase ranking.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention scoring function.

See all papers in Proc. ACL that mention scoring function.

Back to top.

topic model

Appears in 4 sentences as: topic model (2) topic modeling (1) topic models (1)
In Topical Keyphrase Extraction from Twitter
  1. To extract keyphrases, we first identify topics from the Twitter collection using topic models (Section 3.2).
    Page 2, “Method”
  2. Author-topic models have been shown to be effective for topic modeling of microblogs (Weng et al., 2010; Hong and Davison, 2010).
    Page 2, “Method”
  3. Given the topic model gbt previously learned for topic 75, we can set P(w|t,R = 1) to 2,, i.e.
    Page 4, “Method”
  4. We also tried the standard LDA model and the author-topic model on our data set and found that our proposed topic model was better or at least comparable in terms of finding meaningful topics.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention topic model.

See all papers in Proc. ACL that mention topic model.

Back to top.

edge weight

Appears in 3 sentences as: edge weight (4) edge weights (1)
In Topical Keyphrase Extraction from Twitter
  1. However, the original TPR ignores the topic context when setting the edge weights; the edge weight is set by counting the number of co-occurrences of the two words within a certain window size.
    Page 3, “Method”
  2. Taking the topic of “electronic products” as an example, the word “juice” may co-occur frequently with a good keyword “apple” for this topic because of Apple electronic products, so “juice” may be ranked high by this context-free co-occurrence edge weight although it is not related to electronic products.
    Page 3, “Method”
  3. jiwj—WM (2) Here we compute the propagation from wj to w,- in the context of topic 75, namely, the edge weight from wj to w,- is parameterized by t. In this paper, we compute edge weight 6,; (wj, between two words by counting the number of co-occurrences of these two words in tweets assigned to topic 75.
    Page 3, “Method”

See all papers in Proc. ACL 2011 that mention edge weight.

See all papers in Proc. ACL that mention edge weight.

Back to top.