Abstract | Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. |
Introduction | More importantly, their model was applied to news articles and scientific publications, where most documents follow the global topical trends. |
Method | Unlike news articles from traditional media, which are mostly about current affairs, an important property of microblog posts is that many posts are about users’ personal encounters and interests rather than global events. |
Related Work | Important events are those reported in a large number of news articles and each event is constructed according to one single query and represented by a set of sentences. |
Temporal and Linguistic Processing | In news articles , this is the DCT. |
Temporal and Linguistic Processing | Figure 4 shows an example of an analyzed excerpt of a news article . |
Evaluation | Since our major focus is to detect events from news articles , we only keep the web pages with keyword “news” in URL field. |
Introduction | One standard way for that is to cluster news articles as events by following a two-step approach (Yang et al., 1998): 1) represent document as vectors and calculate similarities between documents; 2) run the clustering algorithm to obtain document clusters as events.1 Underlying text representation often plays a critical role in this approach, especially for long text streams. |
Introduction | D1 and D2 are news articles about U.S. presidential election respectively in years 2004 and 2008. |