Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
Volkova, Svitlana and Wilson, Theresa and Yarowsky, David

Article Structure

Abstract

We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams.

Introduction

The language that people use to express opinions and sentiment is extremely diverse.

Related Work

Mihalcea et.al (2012) classifies methods for bootstrapping subjectivity lexicons into two types: corpus-based and dictionary-based.

Data

For the experiments in this paper, we use three sets of data for each language: 1M unlabeled tweets (BOOT) for bootstrapping Twitter-specific lexicons, 2K labeled tweets for development data (DEV), and 2K labeled tweets for evaluation (TEST).

Lexicon Bootstrapping

To create a Twitter-specific sentiment lexicon for a given language, we start with a general-purpose, high-precision sentiment lexicon2 and bootstrap from the unlabeled data (BOOT) using the labeled development data (DEV) to guide the process.

Lexicon Evaluations

We evaluate our bootstrapped sentiment lexicons English LE , Spanish L% and Russian Lg by comparing them with existing dictionary-expanded lexicons that have been previously shown to be effective for subjectivity and polarity classification (Esuli and Sebastiani, 2006; Perez-Rosas et al., 2012; Chetviorkin and Loukachevitch, 2012).

Conclusions

We propose a scalable and language independent bootstrapping approach for learning subjectivity clues from Twitter streams.

Topics

sentiment lexicons

Appears in 17 sentences as: sentiment lexicon (8) sentiment lexicons (11)
In Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
  1. Starting with a domain-independent, high-precision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons , using a small amount of labeled data to guide the process.
    Page 1, “Abstract”
  2. General, domain-independent sentiment lexicons have low coverage.
    Page 1, “Introduction”
  3. Most of the previous work on sentiment lexicon construction relies on existing natural language
    Page 1, “Introduction”
  4. Although bootstrapping has been used for leam-ing sentiment lexicons in other domains (Turney and Littman, 2002; Banea et al., 2008), it has not yet been applied to learning sentiment lexicons for microblogs.
    Page 1, “Introduction”
  5. Dictionary-based methods rely on existing lexical resources to bootstrap sentiment lexicons .
    Page 2, “Related Work”
  6. (2009) use a thesaurus to aid in the construction of a sentiment lexicon for English.
    Page 2, “Related Work”
  7. Corpus-based methods extract subjectivity and sentiment lexicons from large amounts of unlabeled data using different similarity metrics to measure the relatedness between words.
    Page 2, “Related Work”
  8. (2010) bootstrap sentiment lexicons for English from the web by using Pointwise Mutual Information (PMI) and graph propagation approach, respectively.
    Page 2, “Related Work”
  9. Kaji and Kitsuregawa (2007) propose a method for building sentiment lexicon for Japanese from HTML pages.
    Page 2, “Related Work”
  10. It also can be effectively used to bootstrap sentiment lexicons for any language for which a bilingual dictionary is available or can be automatically induced from parallel corpora.
    Page 2, “Related Work”
  11. To create a Twitter-specific sentiment lexicon for a given language, we start with a general-purpose, high-precision sentiment lexicon2 and bootstrap from the unlabeled data (BOOT) using the labeled development data (DEV) to guide the process.
    Page 3, “Lexicon Bootstrapping”

See all papers in Proc. ACL 2013 that mention sentiment lexicons.

See all papers in Proc. ACL that mention sentiment lexicons.

Back to top.

social media

Appears in 14 sentences as: social media (15)
In Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
  1. We study subjective language in social media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams.
    Page 1, “Abstract”
  2. Our experiments on English, Spanish and Russian show that the resulting lexicons are effective for sentiment classification for many under-explored languages in social media .
    Page 1, “Abstract”
  3. This is true for well-formed data, such as news and reviews, and it is particularly true for data from social media .
    Page 1, “Introduction”
  4. Communication in social media is informal, abbreviations and misspellings abound, and the person communicating is often trying to be funny, creative, and entertaining.
    Page 1, “Introduction”
  5. The dynamic nature of social media together with the extreme diversity of subjective language has implications for any system with the goal of analyzing sentiment in this domain.
    Page 1, “Introduction”
  6. Even models trained specifically on social media data may degrade somewhat over time as topics change and new sentiment-bearing terms crop up.
    Page 1, “Introduction”
  7. However, such tools and lexical resources are not available for many languages spoken in social media .
    Page 1, “Introduction”
  8. Any method for analyzing sentiment in microblogs or other social media streams must be easily adapted to (1) many low-resource languages, (2) the dynamic nature of social media , and (3) working in a streaming mode with limited or no supervision.
    Page 1, “Introduction”
  9. namic nature of social media ;
    Page 1, “Introduction”
  10. languages and dialects in social media ;
    Page 1, “Introduction”
  11. However, the lexical resources that dictionary-based methods need, do not yet exist for the majority of languages in social media .
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention social media.

See all papers in Proc. ACL that mention social media.

Back to top.

F-measure

Appears in 7 sentences as: F-measure (7)
In Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
  1. The set of parameters 5 is optimized using a grid search on the development data using F-measure for subjectivity classification.
    Page 4, “Lexicon Bootstrapping”
  2. and F-measure results.
    Page 4, “Lexicon Evaluations”
  3. For polarity classification we get comparable F-measure but much higher recall for Lg compared to SW N.
    Page 4, “Lexicon Evaluations”
  4. Figure 1: Precision (x-axis), recall (y-axis) and F-measure (in the table) for English: L?
    Page 4, “Lexicon Evaluations”
  5. We report precision, recall and F-measure in Figure 2.
    Page 4, “Lexicon Evaluations”
  6. Figure 2: Precision (x-axis), recall (y-axis) and F-measure (in the table) for Spanish: L?
    Page 4, “Lexicon Evaluations”
  7. Figure 3: Precision (x-axis), recall (y-axis) and F-measure for Russian: L?
    Page 5, “Lexicon Evaluations”

See all papers in Proc. ACL 2013 that mention F-measure.

See all papers in Proc. ACL that mention F-measure.

Back to top.

unlabeled data

Appears in 4 sentences as: unlabeled data (4)
In Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
  1. Starting with a domain-independent, high-precision sentiment lexicon and a large pool of unlabeled data , we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process.
    Page 1, “Abstract”
  2. Corpus-based methods extract subjectivity and sentiment lexicons from large amounts of unlabeled data using different similarity metrics to measure the relatedness between words.
    Page 2, “Related Work”
  3. To create a Twitter-specific sentiment lexicon for a given language, we start with a general-purpose, high-precision sentiment lexicon2 and bootstrap from the unlabeled data (BOOT) using the labeled development data (DEV) to guide the process.
    Page 3, “Lexicon Bootstrapping”
  4. On each iteration i 2 1, tweets in the unlabeled data are labeled using the lexicon
    Page 3, “Lexicon Bootstrapping”

See all papers in Proc. ACL 2013 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

rule-based

Appears in 3 sentences as: rule-based (3)
In Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
  1. For that we perform subjectivity and polarity classification using rule-based classifiers6 on the test data E-TEST, S-TEST and R-TEST.
    Page 4, “Lexicon Evaluations”
  2. We consider how the various lexicons perform for rule-based classifiers for both subjectivity and polarity.
    Page 4, “Lexicon Evaluations”
  3. 6Similar approach to a rule-based classification using terms from he MPQA lexicon (Riloff and Wiebe, 2003).
    Page 4, “Lexicon Evaluations”

See all papers in Proc. ACL 2013 that mention rule-based.

See all papers in Proc. ACL that mention rule-based.

Back to top.

WordNet

Appears in 3 sentences as: WordNet (3)
In Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
  1. processing tools, e.g., syntactic parsers (Wiebe, 2000), information extraction (IE) tools (Riloff and Wiebe, 2003) or rich lexical resources such as WordNet (Esuli and Sebastiani, 2006).
    Page 1, “Introduction”
  2. Many researchers have explored using relations in WordNet (Miller, 1995), e. g., Esuli and Sabastiani (2006), Andreevskaia and Bergler (2006) for English, Rao and Ravichandran (2009) for Hindi and French, and Perez-Rosas et al.
    Page 2, “Related Work”
  3. There is also a mismatch between the formality of many language resources, such as WordNet , and the extremely informal language of social media.
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.