A Simple Bayesian Modelling Approach to Event Extraction from Twitter
Zhou, Deyu and Chen, Liangyu and He, Yulan

Article Structure

Abstract

With the proliferation of social media sites, social streams have proven to contain the most up-to-date information on current events.

Introduction

Event extraction is to automatically identify events from text with information about what happened, when, where, to whom, and why.

Methodology

Events extracted in our proposed framework are represented as a 4-tuple (y, d, l, k), where 3/ stands for a non-location named entity, d for a date, I for a location, and k for an event-related keyword.

Experiments

In this section, we first describe the Twitter corpus used in our experiments and then present how we build a baseline based on the previously proposed TwiCal system (Ritter et al., 2012), the state-of-the-art open event extraction system on tweets.

Conclusions and Future Work

In this paper we have proposed an unsupervised Bayesian model, called the Latent Event Model (LEM), to extract the structured representation of events from social media data.

Topics

named entities

Appears in 25 sentences as: Named entities (1) named entities (14) Named Entity (3) Named entity (1) named entity (13)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. (2012) presented a system called TwiCal which extracts an open-domain calendar of significant events represented by a 4-tuple set including a named entity , event phrase, calendar date, and event type from Twitter.
    Page 1, “Introduction”
  2. Here, event elements include named entities such as person, company, organization, date/time, location, and the relations among them.
    Page 1, “Introduction”
  3. Events extracted in our proposed framework are represented as a 4-tuple (y, d, l, k), where 3/ stands for a non-location named entity , d for a date, I for a location, and k for an event-related keyword.
    Page 2, “Methodology”
  4. Tweets are preprocessed by time expression recognition, named entity recognition, POS tagging and stemming.
    Page 2, “Methodology”
  5. Named Entity Recognition.
    Page 2, “Methodology”
  6. Named entity recognition (NER) is a crucial step since the results would directly impact the final extracted 4-tuple (y,d, l, It is not easy to accurately identify named entities in the Twitter data since tweets contain a lot of misspellings and abbreviations.
    Page 2, “Methodology”
  7. Therefore, named entities mentioned in tweets are likely to appear in news articles as well.
    Page 2, “Methodology”
  8. We thus perform named entity recognition in the following way.
    Page 2, “Methodology”
  9. First, a traditional NER tool such as the Stanford Named Entity Recognizer2 is used to identify named entities from the news articles crawled from BBC and CNN during the same period that the tweets were published.
    Page 2, “Methodology”
  10. The recog-nised named entities from news are then used to build a dictionary.
    Page 2, “Methodology”
  11. Named entities from tweets are extracted by looking up the dictionary through fuzzy matching.
    Page 2, “Methodology”

See all papers in Proc. ACL 2014 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

F-measure

Appears in 9 sentences as: F-measure (9)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. Experimental results show that the proposed model achieves 83% in F-measure , and outperforms the state-of-the-art baseline by over 7%.
    Page 1, “Abstract”
  2. We have conducted experiments on a Twitter corpus and the results show that our proposed approach outperforms TwiCal, the state-of-the-art open event extraction system, by 7.7% in F-measure .
    Page 2, “Introduction”
  3. It is worth noting that the F-measure reported for the event phrase extraction is only 64% in the baseline approach (Ritter et al., 2012).
    Page 4, “Experiments”
  4. Method Tuple Evaluated Precision Recall F-measure
    Page 5, “Experiments”
  5. Method Tuple Evaluated Precision Recall F-measure
    Page 5, “Experiments”
  6. The overall improvement on F-measure is around 7.76%.
    Page 5, “Experiments”
  7. It can be observed from Table 2 that by using NW—NER, the performance of event extraction system is improved significantly by 7.5% and 3% respectively on F-measure when evaluated on 3-tuples (without keywords) or 4-tuples (with keywords).
    Page 5, “Experiments”
  8. If further increasing E, we notice more balanced precisiorflrecall values and a relatively stable F-measure .
    Page 5, “Experiments”
  9. Experimental results show our proposed framework outperforms the state-of-the-art baseline by over 7% in F-measure .
    Page 5, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention F-measure.

See all papers in Proc. ACL that mention F-measure.

Back to top.

social media

Appears in 8 sentences as: Social media (1) social media (7)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. With the proliferation of social media sites, social streams have proven to contain the most up-to-date information on current events.
    Page 1, “Abstract”
  2. However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy.
    Page 1, “Abstract”
  3. In this paper we propose a simple and yet effective Bayesian model, called Latent Event Model (LEM), to extract structured representation of events from social media .
    Page 1, “Abstract”
  4. With the increasing popularity of social media , social networking sites such as Twitter have become an important source of event information.
    Page 1, “Introduction”
  5. Social media messages are often short and evolve rapidly over time.
    Page 1, “Introduction”
  6. In our work here, we notice a very important property in social media data that the same event could be referenced by high volume messages.
    Page 1, “Introduction”
  7. We thus propose a Latent Event Model (LEM) which can automatically detect events from social media without the use of labeled data.
    Page 1, “Introduction”
  8. In this paper we have proposed an unsupervised Bayesian model, called the Latent Event Model (LEM), to extract the structured representation of events from social media data.
    Page 5, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention social media.

See all papers in Proc. ACL that mention social media.

Back to top.

NER

Appears in 6 sentences as: NER (6)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. Named entity recognition ( NER ) is a crucial step since the results would directly impact the final extracted 4-tuple (y,d, l, It is not easy to accurately identify named entities in the Twitter data since tweets contain a lot of misspellings and abbreviations.
    Page 2, “Methodology”
  2. First, a traditional NER tool such as the Stanford Named Entity Recognizer2 is used to identify named entities from the news articles crawled from BBC and CNN during the same period that the tweets were published.
    Page 2, “Methodology”
  3. Table 2: Comparison of the performance of event extraction using different NER method.
    Page 5, “Experiments”
  4. We experimented with two approaches for named entity recognition ( NER ) in preprocessing.
    Page 5, “Experiments”
  5. One is to use the NER tool trained specifically on the Twitter data (Ritter et al., 2011), denoted as “TW-NER” in Table 2.
    Page 5, “Experiments”
  6. The other uses the traditional Stanford NER to extract named entities from news articles published in the same period and then perform fuzzy matching to identify named entities from tweets.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

extraction system

Appears in 5 sentences as: extraction system (3) extraction systems (2)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy.
    Page 1, “Abstract”
  2. We have conducted experiments on a Twitter corpus and the results show that our proposed approach outperforms TwiCal, the state-of-the-art open event extraction system , by 7.7% in F-measure.
    Page 2, “Introduction”
  3. In this section, we first describe the Twitter corpus used in our experiments and then present how we build a baseline based on the previously proposed TwiCal system (Ritter et al., 2012), the state-of-the-art open event extraction system on tweets.
    Page 4, “Experiments”
  4. To evaluate the performance of the propose approach, we use preeison, recall, and F —measure as in general information extraction systems (Makhoul et al., 1999).
    Page 4, “Experiments”
  5. It can be observed from Table 2 that by using NW—NER, the performance of event extraction system is improved significantly by 7.5% and 3% respectively on F-measure when evaluated on 3-tuples (without keywords) or 4-tuples (with keywords).
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention extraction system.

See all papers in Proc. ACL that mention extraction system.

Back to top.

news articles

Appears in 5 sentences as: news articles (5)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. Previous work in event extraction has focused largely on news articles , as the newswire texts have been the best source of information on current events (Hogen-boom et al., 2011).
    Page 1, “Introduction”
  2. However, it is often observed that events mentioned in tweets are also reported in news articles in the same period (Petrovic et al., 2013).
    Page 2, “Methodology”
  3. Therefore, named entities mentioned in tweets are likely to appear in news articles as well.
    Page 2, “Methodology”
  4. First, a traditional NER tool such as the Stanford Named Entity Recognizer2 is used to identify named entities from the news articles crawled from BBC and CNN during the same period that the tweets were published.
    Page 2, “Methodology”
  5. The other uses the traditional Stanford NER to extract named entities from news articles published in the same period and then perform fuzzy matching to identify named entities from tweets.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.

proposed model

Appears in 3 sentences as: proposed model (3)
In A Simple Bayesian Modelling Approach to Event Extraction from Twitter
  1. Experimental results show that the proposed model achieves 83% in F-measure, and outperforms the state-of-the-art baseline by over 7%.
    Page 1, “Abstract”
  2. Instead of employing labeled corpora for training, the proposed model only requires the identification of named entities, locations and time expressions.
    Page 5, “Conclusions and Future Work”
  3. Our proposed model has been evaluated on the FSD corpus.
    Page 5, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention proposed model.

See all papers in Proc. ACL that mention proposed model.

Back to top.