Generating Focused Topic-Specific Sentiment Lexicons
Jijkoun, Valentin and de Rijke, Maarten and Weerkamp, Wouter

Article Structure

Abstract

We present a method for automatically generating focused and accurate topic-specific subjectivity lexicons from a general purpose polarity lexicon that allow users to pinpoint subjective on-topic information in a set of relevant documents.

Introduction

In the area of media analysis, one of the key tasks is collecting detailed information about opinions and attitudes toward specific topics from various sources, both offline (traditional newspapers, archives) and online (news sites, blogs, forums).

Related Work

Much work has been done in sentiment analysis.

Generating Topic-Specific Lexicons

In this section we describe how we generate a lexicon of subjectivity clues and targets for a given topic and a list of relevant documents (e.g., re-

Data and Experimental Setup

We consider two types of evaluation.

Qualitative Analysis of Lexicons

Lexicon size (the number of entries) and selectivity (how often entries match in text) of the generated lexicons vary depending on the parameters D and T introduced above.

Quantitative Evaluation of Lexicons

In this section we assess the quality of the generated topic-specific lexicons numerically and extrinsically.

Conclusions and Future Work

We have described a bootstrapping method for deriving a topic-specific lexicon from a general purpose polarity lexicon.

Topics

syntactic contexts

Appears in 12 sentences as: syntactic context (4) syntactic contexts (8)
In Generating Focused Topic-Specific Sentiment Lexicons
  1. Extract all syntactic contexts of clue words
    Page 4, “Generating Topic-Specific Lexicons”
  2. 3.1 Step 1: Extracting syntactic contexts
    Page 4, “Generating Topic-Specific Lexicons”
  3. First, we identify syntactic contexts in which specific clue words can be used to express
    Page 4, “Generating Topic-Specific Lexicons”
  4. We take a simple definition of the syntactic context : a single labeled directed dependency relation.
    Page 4, “Generating Topic-Specific Lexicons”
  5. For every clue word, we extract all syntactic contexts , i.e., all dependencies, in which the word is involved (as head or as modifier) in the background corpus, along with their endpoints.
    Page 4, “Generating Topic-Specific Lexicons”
  6. Our entropy-driven selection of syntactic contexts of a clue word is based on the following assumption:
    Page 4, “Generating Topic-Specific Lexicons”
  7. For every clue word, we select the top D syntactic contexts whose entropy is at least half of the maximum entropy for this clue.
    Page 4, “Generating Topic-Specific Lexicons”
  8. To summarize, at the end of Step 1 of our method, we have extracted a list of pairs (clae word, syntactic context ) such that for occurrences of the clue word, the words at the endpoint of the syntactic dependency are likely to be targets of sentiments.
    Page 4, “Generating Topic-Specific Lexicons”
  9. The resulting list of triples (clue word, syntactic context , target) constitute the lexicon.
    Page 5, “Generating Topic-Specific Lexicons”
  10. Because our topic-specific lexicons consist of triples (clue word, syntactic context , target), they actually contain more words than topic-independent lexicons of the same size, but topic-specific entries are more selective, which makes the lexicon more focused.
    Page 6, “Qualitative Analysis of Lexicons”
  11. D: the number of syntactic contexts per clue
    Page 7, “Quantitative Evaluation of Lexicons”

See all papers in Proc. ACL 2010 that mention syntactic contexts.

See all papers in Proc. ACL that mention syntactic contexts.

Back to top.

reranking

Appears in 11 sentences as: rerank (2) reranked (1) Reranking (1) reranking (6) reranking: (1)
In Generating Focused Topic-Specific Sentiment Lexicons
  1. In stage (2) one commonly uses either a binary classifier to distinguish between opinionated and non-opinionated documents or applies reranking of the initial result list using some opinion score.
    Page 3, “Related Work”
  2. The best performing opinion finding system at TREC 2008 is a two-stage approach using reranking in stage (2) (Lee et al., 2008).
    Page 3, “Related Work”
  3. This opinion score is combined with the relevance score, and posts are reranked according to this new score.
    Page 3, “Related Work”
  4. approach is to rerank the results from stage 1, instead of doing actual binary classification.
    Page 7, “Quantitative Evaluation of Lexicons”
  5. 6.1 Reranking using a lexicon
    Page 7, “Quantitative Evaluation of Lexicons”
  6. To rerank a list of posts retrieved for a given topic, we opt to use the method that showed best performance at TREC 2008.
    Page 7, “Quantitative Evaluation of Lexicons”
  7. topic-independent lexicon to the topic-dependent lexicons our method generates, which are also plugged into the reranking of Lee et al.
    Page 7, “Quantitative Evaluation of Lexicons”
  8. There are several parameters that we can vary when generating a topic-specific lexicon and when using it for reranking:
    Page 7, “Quantitative Evaluation of Lexicons”
  9. Note that 04 does not affect the lexicon creation, but only how the lexicon is used in reranking .
    Page 7, “Quantitative Evaluation of Lexicons”
  10. First, we note that reranking using all lexicons in Table 4 significantly improves over the relevance-only baseline for all evaluation measures.
    Page 7, “Quantitative Evaluation of Lexicons”
  11. The consistent improvement for topical retrieval suggests that a topic-specific lexicon can be used both for query expansion (as described in this section) and for opinion reranking (as described in Section 6.1).
    Page 9, “Quantitative Evaluation of Lexicons”

See all papers in Proc. ACL 2010 that mention reranking.

See all papers in Proc. ACL that mention reranking.

Back to top.

sentiment analysis

Appears in 11 sentences as: Sentiment analysis (3) sentiment analysis (9)
In Generating Focused Topic-Specific Sentiment Lexicons
  1. Recent advances in language technology, especially in sentiment analysis , promise to (partially) automate this task.
    Page 1, “Introduction”
  2. Sentiment analysis is often considered in the context of the following two tasks:
    Page 1, “Introduction”
  3. How can technology developed for sentiment analysis be applied to media analysis?
    Page 1, “Introduction”
  4. To be able to support media analysis, we need to combine the specificity of (phrase- or word-level) sentiment analysis with the topicality provided by sentiment retrieval.
    Page 2, “Introduction”
  5. Most modern approaches to sentiment analysis , however, use various flavors of classification, where decisions (typically) come with confidence scores, but without explicit support.
    Page 2, “Introduction”
  6. Unlike other methods for topic-specific sentiment analysis , we do not expand a seed lexicon.
    Page 2, “Introduction”
  7. Much work has been done in sentiment analysis .
    Page 2, “Related Work”
  8. We discuss related work in four parts: sentiment analysis in general, domain- and target-specific sentiment analysis , product review mining and sentiment retrieval.
    Page 2, “Related Work”
  9. 2.1 Sentiment analysis
    Page 2, “Related Work”
  10. Sentiment analysis is often seen as two separate steps for determining subjectivity and polarity.
    Page 2, “Related Work”
  11. At TREC, the Text REtrieval Conference, there has been interest in a specific type of sentiment analysis : opinion retrieval.
    Page 3, “Related Work”

See all papers in Proc. ACL 2010 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

sentiment lexicon

Appears in 3 sentences as: sentiment lexicon (2) sentiment lexicons (1)
In Generating Focused Topic-Specific Sentiment Lexicons
  1. Since it is unrealistic to construct sentiment lexicons , or manually annotate text for learning, for every imaginable domain or topic, automatic methods have been developed.
    Page 2, “Related Work”
  2. where O is the set of terms in the sentiment lexicon , P(sub|w) indicates the probability of term 212 being subjective, and n(w, D) is the number of times term 21) occurs in document D. The opinion scoring can weigh lexicon terms differently, using P(sub|w); it normalizes scores to cancel out the effect of varying document sizes.
    Page 7, “Quantitative Evaluation of Lexicons”
  3. where n(O, D) is the number of matches of the term of sentiment lexicon O in document D.
    Page 7, “Quantitative Evaluation of Lexicons”

See all papers in Proc. ACL 2010 that mention sentiment lexicon.

See all papers in Proc. ACL that mention sentiment lexicon.

Back to top.