Sentiment Relevance
Scheible, Christian and Schütze, Hinrich

Article Structure

Abstract

A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not.

Introduction

It is generally recognized in sentiment analysis that only a subset of the content of a document contributes to the sentiment it conveys.

Sentiment Relevance

Sentiment Relevance is a concept to distinguish content informative for determining the sentiment of a document from uninformative content.

Related Work

Many publications have addressed subjectivity in sentiment analysis.

Methods

Due to the sequential properties of S-relevance (cf.

Features

Choosing features is crucial in situations where no high-quality training data is available.

Distant Supervision

Since a large labeled resource for sentiment relevance classification is not yet available, we investigate semi-supervised methods for creating sentiment relevance classifiers.

Transfer Learning

To address the problem that we do not have enough labeled SR data we now investigate a second semi-supervised method for SR classification, transfer learning (TL).

Conclusion

A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not.

Topics

distant supervision

Appears in 14 sentences as: Distant supervision (1) distant supervision (14)
In Sentiment Relevance
  1. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer.
    Page 1, “Abstract”
  2. The first approach is distant supervision (DS).
    Page 1, “Introduction”
  3. results of our experiments on distant supervision (Section 6) and transfer learning (Section 7).
    Page 2, “Introduction”
  4. Their setup differs from ours as our focus lies on pattern-based distant supervision instead of distant supervision using documents for sentence classification.
    Page 4, “Related Work”
  5. Distant supervision and transfer learning are settings where exact training data is unavailable.
    Page 5, “Features”
  6. In this section, we show how to bootstrap a sentiment relevance classifier by distant supervision (DS) .
    Page 6, “Distant Supervision”
  7. Even though we do not have sentiment relevance annotations, there are sources of metadata about the movie domain that we can leverage for distant supervision .
    Page 6, “Distant Supervision”
  8. We call these labels inferred from NE metadata distant supervision (DS) labels.
    Page 6, “Distant Supervision”
  9. This is a form of distant supervision in that we use the IMDb database as described in Section 5 to automatically label sentences based on which metadata from the database they contain.
    Page 6, “Distant Supervision”
  10. This distant supervision setup suffers from two issues.
    Page 6, “Distant Supervision”
  11. Second, there is no way to control the quality of the input to the classifier, as we have no confidence measure for our distant supervision labeling rule.
    Page 6, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention distant supervision.

See all papers in Proc. ACL that mention distant supervision.

Back to top.

named entity

Appears in 11 sentences as: Named Entities (1) named entities (4) Named entity (1) named entity (5)
In Sentiment Relevance
  1. An error analysis for the classifier trained on P&L shows that many sentences misclassified as S-relevant (fpSR) contain polar words; for example, Then, the situation turns M. In contrast, sentences misclassified as S-nonrelevant (fpSNR) contain named entities or plot and movie business vocabulary; for example, Ti_m Roth delivers the most impressive MM by getting the M language right.
    Page 3, “Sentiment Relevance”
  2. We follow their approach by using IMDb to define named entity features.
    Page 4, “Related Work”
  3. 5.2 Named Entities
    Page 5, “Features”
  4. As standard named entity recognition (NER) systems do not capture categories that are relevant to the movie domain, we opt for a lexicon-based approach similar to (Zhuang et al., 2006).
    Page 5, “Features”
  5. If a capitalized word occurs, we check whether it is part of an already recognized named entity .
    Page 5, “Features”
  6. Personal pronouns will match the most recently encountered named entity .
    Page 5, “Features”
  7. feature set is referred to as named entities (NE).
    Page 6, “Features”
  8. As this classifier uses training data that is biased towards a specialized case (sentences containing the named entity types creators and characters), it does not generalize well to other S-relevance problems and thus yields lower performance on the full dataset.
    Page 6, “Distant Supervision”
  9. First, the classifier only sees a subset of examples that contain named entities , making generalization to other types of expressions difficult.
    Page 6, “Distant Supervision”
  10. Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM.
    Page 7, “Distant Supervision”
  11. Named entity recognition, accomplished with data extracted from a domain-specific database, plays a significant rule in creating an initial labeling.
    Page 8, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

MaXEnt

Appears in 10 sentences as: MaXEnt (8) MaxEnt (5)
In Sentiment Relevance
  1. We divide both the SR and P&L corpora into training (50%) and test sets (50%) and train a Maximum Entropy ( MaxEnt ) classifier (Manning and Klein, 2003) with bag-of-word features.
    Page 2, “Sentiment Relevance”
  2. To increase coverage, we train a Maximum Entropy ( MaxEnt ) classifier (Manning and Klein,
    Page 6, “Distant Supervision”
  3. The MaxEnt model achieves an F1 of 61.2% on the SR corpus (Table 3, line 2).
    Page 6, “Distant Supervision”
  4. As described in Section 4, each document is represented as a graph of sentences and weights between sentences and source/sink nodes representing SR/SNR are set to the confidence values obtained from the distantly trained MaxEnt classifier.
    Page 6, “Distant Supervision”
  5. Following this assumption, we train MaXEnt and Conditional Random Field (CRF, (McCallum, 2002)) classifiers on the [6% of documents that have the lowest maximum flow values f, where k is a parameter which we optimize using the run count method introduced in Section 4.
    Page 7, “Distant Supervision”
  6. (2) a MaXEnt baseline trained on DS labels without application of MinCut; (3) the base classifier using MinCut (DSlabels+MinCut) as described above.
    Page 7, “Distant Supervision”
  7. Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM.
    Page 7, “Distant Supervision”
  8. We found that all classifiers using DS labels and Mincut are significantly better than MaXEnt trained on purely rule-based DS labels (line 2).
    Page 7, “Distant Supervision”
  9. Also, the MaXEnt models using SQ features (lines 7,8) are significantly better than the MinCut base classifier (line 3).
    Page 7, “Distant Supervision”
  10. quence model, we train a CRF (line 6); however, the improvement over MaxEnt (line 4) is not significant.
    Page 7, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention MaXEnt.

See all papers in Proc. ACL that mention MaXEnt.

Back to top.

sentiment analysis

Appears in 8 sentences as: sentiment analysis (8)
In Sentiment Relevance
  1. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems.
    Page 1, “Abstract”
  2. It is generally recognized in sentiment analysis that only a subset of the content of a document contributes to the sentiment it conveys.
    Page 1, “Introduction”
  3. Some sentiment analysis systems filter out objective language and predict sentiment based on subjective language only because objective statements do not directly reveal sentiment.
    Page 1, “Introduction”
  4. they are not optimal for sentiment analysis .
    Page 1, “Introduction”
  5. Many publications have addressed subjectivity in sentiment analysis .
    Page 3, “Related Work”
  6. As we argue above, if the goal is to identify parts of a document that are useful/non-useful for sentiment analysis , then S-relevance is a better notion to use.
    Page 3, “Related Work”
  7. Transfer learning has been applied previously in sentiment analysis (Tan and Cheng, 2009), targeting polarity detection.
    Page 4, “Related Work”
  8. We introduced sentiment relevance to make this distinction and argued that it better reflects the requirements of sentiment analysis systems.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

semi-supervised

Appears in 5 sentences as: semi-supervised (5)
In Sentiment Relevance
  1. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer.
    Page 1, “Abstract”
  2. For this reason, we investigate two semi-supervised approaches to S-relevance classification that do not require S-relevance-labeled data.
    Page 1, “Introduction”
  3. Since a large labeled resource for sentiment relevance classification is not yet available, we investigate semi-supervised methods for creating sentiment relevance classifiers.
    Page 6, “Distant Supervision”
  4. To address the problem that we do not have enough labeled SR data we now investigate a second semi-supervised method for SR classification, transfer learning (TL).
    Page 8, “Transfer Learning”
  5. Since a large labeled sentiment relevance resource does not yet exist, we investigated semi-supervised approaches to S-relevance classification that do not require S-relevance-labeled data.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

feature set

Appears in 4 sentences as: feature set (2) feature sets (2)
In Sentiment Relevance
  1. We refer to these feature sets as CoreLex (CX) and VerbNet (VN) features and to their combination as semantic features (SEM).
    Page 5, “Features”
  2. feature set is referred to as named entities (NE).
    Page 6, “Features”
  3. We refer to this feature set as sequential features (SQ).
    Page 6, “Features”
  4. However, we did not find a cumulative effect (line 8) of the two feature sets .
    Page 7, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

NER

Appears in 4 sentences as: NER (5)
In Sentiment Relevance
  1. As standard named entity recognition ( NER ) systems do not capture categories that are relevant to the movie domain, we opt for a lexicon-based approach similar to (Zhuang et al., 2006).
    Page 5, “Features”
  2. Many entries are unsuitable for NER , e.g., dog is frequently listed as a character.
    Page 5, “Features”
  3. This rule has precedence over NER, so if a name matches a labeled entity, we do not attempt to label it through NER .
    Page 5, “Features”
  4. Generally, the quality of NER is crucial in this task.
    Page 7, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

CRF

Appears in 3 sentences as: CRF (3)
In Sentiment Relevance
  1. Following this assumption, we train MaXEnt and Conditional Random Field ( CRF , (McCallum, 2002)) classifiers on the [6% of documents that have the lowest maximum flow values f, where k is a parameter which we optimize using the run count method introduced in Section 4.
    Page 7, “Distant Supervision”
  2. Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM.
    Page 7, “Distant Supervision”
  3. quence model, we train a CRF (line 6); however, the improvement over MaxEnt (line 4) is not significant.
    Page 7, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

fine-grained

Appears in 3 sentences as: fine-grained (3)
In Sentiment Relevance
  1. In our approach, we classify sentences as S-(non)relevant because this is the most fine-grained level at which S-relevance manifests itself; at the word or phrase level, S-relevance classification is not possible because of scope and context effects.
    Page 1, “Introduction”
  2. Our work is most closely related to (Taboada et al., 2009) who define a fine-grained classification that is similar to sentiment relevance on the highest level.
    Page 3, “Related Work”
  3. Tackstro'm and McDonald (2011) develop a fine-grained annotation scheme that includes S-nonrelevance as one of five categories.
    Page 4, “Related Work”

See all papers in Proc. ACL 2013 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

labeled data

Appears in 3 sentences as: labeled data (2) labeling data (1)
In Sentiment Relevance
  1. In general, it is not possible to know what the underlying concepts of a statistical classification are if no detailed annotation guidelines exist and no direct evaluation of manually labeled data is performed.
    Page 3, “Related Work”
  2. Supervised optimization is impossible as we do not have any labeled data .
    Page 5, “Methods”
  3. To summarize, the results of our experiments using distant supervision show that a sentiment relevance classifier can be trained successfully by labeling data with a few simple feature rules, with
    Page 7, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

Maximum Entropy

Appears in 3 sentences as: Maximum Entropy (3)
In Sentiment Relevance
  1. We divide both the SR and P&L corpora into training (50%) and test sets (50%) and train a Maximum Entropy (MaxEnt) classifier (Manning and Klein, 2003) with bag-of-word features.
    Page 2, “Sentiment Relevance”
  2. Following previous sequence classification work with Maximum Entropy models (e. g., (Ratna-parkhi, 1996)), we use selected features of adjacent sentences.
    Page 6, “Features”
  3. To increase coverage, we train a Maximum Entropy (MaxEnt) classifier (Manning and Klein,
    Page 6, “Distant Supervision”

See all papers in Proc. ACL 2013 that mention Maximum Entropy.

See all papers in Proc. ACL that mention Maximum Entropy.

Back to top.