Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
Yang, Bishan and Cardie, Claire

Article Structure

Abstract

This paper proposes a novel context-aware method for analyzing sentiment at the level of individual sentences.

Introduction

The ability to extract sentiment from text is crucial for many opinion-mining applications such as opinion summarization, opinion question answering and opinion retrieval.

Related Work

There has been a large amount of work on sentiment analysis at various levels of granularity (Pang and Lee, 2008).

Approach

In this section, we present the details of our proposed approach.

Experiments

We experimented with two product review datasets for sentence-level sentiment classification: the Customer Review (CR) data (Hu and Liu, 2004)6 which contains 638 reviews of 14 products such as cameras and cell phones, and the Multi-domain Amazon (MD) data from the test set of Tackstro'm and McDonald (201 la) which contains 294 reivews from 5 different domains.

Conclusion

In this paper, we propose a context-aware approach for learning sentence-level sentiment.

Topics

sentence-level

Appears in 27 sentences as: sentence-level (27)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. In this paper, we focus on the task of sentence-level sentiment classification in online reviews.
    Page 1, “Introduction”
  2. Semi-supervised techniques have been proposed for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Qu et al., 2012).
    Page 2, “Introduction”
  3. In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning.
    Page 2, “Introduction”
  4. Specifically, we use the Conditional Random Field (CRF) model as the learner for sentence-level sentiment classification, and incorporate rich discourse and lexical knowledge as soft constraints into the learning of CRF parameters via Posterior Regularization (PR) (Ganchev et al., 2010).
    Page 2, “Introduction”
  5. We evaluate our approach on the sentence-level sentiment classification task using two standard product review datasets.
    Page 2, “Introduction”
  6. We also show that discourse knowledge is highly useful for improving sentence-level sentiment classification.
    Page 2, “Introduction”
  7. In this paper, we focus on the study of sentence-level sentiment classification.
    Page 2, “Related Work”
  8. Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data, instead, distant supervision mainly comes from linguistically-motivated constraints.
    Page 2, “Related Work”
  9. We also show that constraints derived from the discourse context can be highly useful for dis-ambiguating sentence-level sentiment.
    Page 3, “Related Work”
  10. We formulate the sentence-level sentiment classification task as a sequence labeling problem.
    Page 3, “Approach”
  11. The inputs to the model are sentence-segmented documents annotated with sentence-level sentiment labels (positive, negative or neutral) along with a set of unlabeled documents.
    Page 3, “Approach”

See all papers in Proc. ACL 2014 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

CRF

Appears in 22 sentences as: CRF (25)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited.
    Page 1, “Abstract”
  2. Specifically, we use the Conditional Random Field (CRF) model as the learner for sentence-level sentiment classification, and incorporate rich discourse and lexical knowledge as soft constraints into the learning of CRF parameters via Posterior Regularization (PR) (Ganchev et al., 2010).
    Page 2, “Introduction”
  3. Unlike most previous work, we explore a rich set of structural constraints that cannot be naturally encoded in the feature-label form, and show that such constraints can improve the performance of the CRF model.
    Page 2, “Introduction”
  4. The CRF model the following conditional probabilities:
    Page 3, “Approach”
  5. The objective function for a standard CRF is to maximize the log-likelihood over a collection of labeled doc-
    Page 3, “Approach”
  6. Most of our constraints can be factorized in the same way as factorizing the model features in the first-order CRF model, and we can compute the expectations under q very efficiently using the forward-backward algorithm.
    Page 6, “Approach”
  7. We trained our model using a CRF incorporated with the proposed posterior constraints.
    Page 7, “Experiments”
  8. For the CRF features, we include the tokens, the part-of-speech tags, the prior polarities of lexical patterns indicated by the opinion lexicon and the negator lexicon, the number of positive and negative tokens and the output of the voteflip algorithm (Choi and Cardie, 2009).
    Page 7, “Experiments”
  9. We set the CRF regularization parameter a = l and set the posterior regularization parameter 6 and y (a tradeoff parameter we introduce to balance the supervised objective and the posterior regularizer in 2) by using grid search 8.
    Page 7, “Experiments”
  10. We compared our method to a number of baselines: (l) CRF: CRF with the same set of model features as in our method.
    Page 7, “Experiments”
  11. (2) CRF-INF: CRF augmented with inference constraints.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

sentiment classification

Appears in 20 sentences as: sentiment classification (20)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. In this paper, we focus on the task of sentence-level sentiment classification in online reviews.
    Page 1, “Introduction”
  2. Semi-supervised techniques have been proposed for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Qu et al., 2012).
    Page 2, “Introduction”
  3. In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning.
    Page 2, “Introduction”
  4. Specifically, we use the Conditional Random Field (CRF) model as the learner for sentence-level sentiment classification , and incorporate rich discourse and lexical knowledge as soft constraints into the learning of CRF parameters via Posterior Regularization (PR) (Ganchev et al., 2010).
    Page 2, “Introduction”
  5. We evaluate our approach on the sentence-level sentiment classification task using two standard product review datasets.
    Page 2, “Introduction”
  6. We also show that discourse knowledge is highly useful for improving sentence-level sentiment classification .
    Page 2, “Introduction”
  7. In this paper, we focus on the study of sentence-level sentiment classification .
    Page 2, “Related Work”
  8. Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data, instead, distant supervision mainly comes from linguistically-motivated constraints.
    Page 2, “Related Work”
  9. We formulate the sentence-level sentiment classification task as a sequence labeling problem.
    Page 3, “Approach”
  10. In this work, we apply PR in the context of CRFs for sentence-level sentiment classification .
    Page 3, “Approach”
  11. We experimented with two product review datasets for sentence-level sentiment classification : the Customer Review (CR) data (Hu and Liu, 2004)6 which contains 638 reviews of 14 products such as cameras and cell phones, and the Multi-domain Amazon (MD) data from the test set of Tackstro'm and McDonald (201 la) which contains 294 reivews from 5 different domains.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

coreference

Appears in 17 sentences as: Coreference (2) coreference (15) coreferential (3) ¢coref (1)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. (2008) defines coreference relations on opinion targets and applies them to constrain the polarity of sentences.
    Page 1, “Introduction”
  2. Opinion Coreference Sentences in a discourse can be linked by many types of coherence relations (Jurafsky et al., 2000).
    Page 5, “Approach”
  3. Coreference is one of the commonly used relations in written text.
    Page 5, “Approach”
  4. In this work, we explore coreference in the context of sentence-level sentiment analysis.
    Page 5, “Approach”
  5. We consider a set of polar sentences to be linked by the opinion coreference relation if they contain core-ferring opinion-related entities.
    Page 5, “Approach”
  6. As these opinion targets are coreferential (referring to the same entity “the speaker phone”), they are linked by the opinion coreference relation 5 .
    Page 5, “Approach”
  7. Our coreference relations indicated by opinion targets overlap with the same target relation introduced in (Somasundaran et al., 2009).
    Page 5, “Approach”
  8. The differences are: (l) we encode the coreference relations as soft constraints during learning instead of applying them as hard constraints during inference time; (2) our constraints can apply to both polar and non-polar sentences; (3) our identification of coreference relations is automatic without any fine-grained annotations for opinion targets.
    Page 5, “Approach”
  9. To extract coreferential opinion targets, we apply Stanford’s coreference system (Lee et al., 2013) to extract coreferential mentions in the document, and then apply a set of syntactic rules to identify opinion targets from the extracted mentions.
    Page 5, “Approach”
  10. For sentences connected by the opinion coreference relation, we expect their sentiment to be consistent.
    Page 5, “Approach”
  11. ¢coref (xay) : Z fooref($i7$jayiayj)
    Page 5, “Approach”

See all papers in Proc. ACL 2014 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

semi-supervised

Appears in 13 sentences as: Semi-supervised (1) semi-supervised (12)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited.
    Page 1, “Abstract”
  2. Experiments on standard product review datasets show that our method outperforms the state-of—the-art methods in both the supervised and semi-supervised settings.
    Page 1, “Abstract”
  3. Semi-supervised techniques have been proposed for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Qu et al., 2012).
    Page 2, “Introduction”
  4. Experimental results show that our model outperforms state-of-the-art methods in both the supervised and semi-supervised settings.
    Page 2, “Introduction”
  5. Our approach is also semi-supervised .
    Page 2, “Related Work”
  6. Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data, instead, distant supervision mainly comes from linguistically-motivated constraints.
    Page 2, “Related Work”
  7. Global Sentiment Previous studies have demonstrated the value of document-level sentiment in guiding the semi-supervised learning of sentence-level sentiment (Tackstrom and McDonald, 2011b; Qu et al., 2012).
    Page 6, “Approach”
  8. We evaluated our method in two settings: supervised and semi-supervised .
    Page 6, “Experiments”
  9. In the semi-supervised setting, our unlabeled data consists of
    Page 6, “Experiments”
  10. Table 3: Accuracy results (%) for semi-supervised sentiment classification (three-way) on the MD dataset
    Page 7, “Experiments”
  11. Table 4: F1 scores for each sentiment category (positive, negative and neutral) for semi-supervised sentiment classification on the MD dataset
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

sentiment analysis

Appears in 9 sentences as: sentiment analysis (9)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. The importance of discourse for sentiment analysis has become increasingly recognized.
    Page 1, “Introduction”
  2. Very little work has explored long-distance discourse relations for sentiment analysis .
    Page 1, “Introduction”
  3. Our work is the first to explore PR for sentiment analysis .
    Page 2, “Introduction”
  4. There has been a large amount of work on sentiment analysis at various levels of granularity (Pang and Lee, 2008).
    Page 2, “Related Work”
  5. Various attempts have been made to incorporate discourse relations into sentiment analysis : Pang and Lee (2004) explored the consistency of subjectivity between neighboring sentences; Mao and Lebanon (2007),McDonald et al.
    Page 2, “Related Work”
  6. We develop a rich set of context-aware posterior constraints for sentence-level sentiment analysis by exploiting lexical and discourse knowledge.
    Page 3, “Approach”
  7. In this work, we explore coreference in the context of sentence-level sentiment analysis .
    Page 5, “Approach”
  8. This demonstrates that our modeling of discourse information is effective and that taking into account the discourse context is important for improving sentence-level sentiment analysis .
    Page 8, “Experiments”
  9. While we focus on the sentence-level task, our approach can be easily extended to handle sentiment analysis at finer levels of granularity.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

classification task

Appears in 7 sentences as: classification task (7)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. We evaluate our approach on the sentence-level sentiment classification task using two standard product review datasets.
    Page 2, “Introduction”
  2. We formulate the sentence-level sentiment classification task as a sequence labeling problem.
    Page 3, “Approach”
  3. For the three-way classification task on the MD dataset, we also implemented the following baselines: (4) VOTEFLIP: a rule-based algorithm that leverages the positive, negative and neutral cues along with the effect of negation to determine the sentence sentiment (Choi and Cardie, 2009).
    Page 7, “Experiments”
  4. We first report results on a binary (positive or negative) sentence-level sentiment classification task .
    Page 7, “Experiments”
  5. We also analyzed the model’s performance on a three-way sentiment classification task .
    Page 8, “Experiments”
  6. For the CRF baseline and its invariants, we observe a similar performance trend as in the two-way classification task : there is nearly no performance improvement from applying the lexical and discourse-connective-based constraints during CRF inference.
    Page 8, “Experiments”
  7. Our experiments show that our model achieves better accuracy than existing supervised and semi-supervised models for the sentence-level sentiment classification task .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention classification task.

See all papers in Proc. ACL that mention classification task.

Back to top.

labeled data

Appears in 6 sentences as: labeled data (6)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited.
    Page 1, “Abstract”
  2. Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data , instead, distant supervision mainly comes from linguistically-motivated constraints.
    Page 2, “Related Work”
  3. PR makes the assumption that the labeled data we have is not enough for learning good model parameters, but we have a set of constraints on the posterior distribution of the labels.
    Page 3, “Approach”
  4. For the MD dataset, we also used the dvd domain as additional labeled data for developing the constraints.
    Page 7, “Experiments”
  5. We found that the PR model is able to correct many CRF errors caused by the lack of labeled data .
    Page 8, “Experiments”
  6. However, with limited labeled data , the CRF learner can only associate very weak sentiment signals to these features.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

soft constraints

Appears in 6 sentences as: soft constraints (6)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning.
    Page 2, “Introduction”
  2. Specifically, we use the Conditional Random Field (CRF) model as the learner for sentence-level sentiment classification, and incorporate rich discourse and lexical knowledge as soft constraints into the learning of CRF parameters via Posterior Regularization (PR) (Ganchev et al., 2010).
    Page 2, “Introduction”
  3. (2013) explored the use of explanatory discourse relations as soft constraints in a Markov Logic Network framework for extracting subjective text segments.
    Page 2, “Related Work”
  4. It has the advantages of utilizing rich discourse knowledge at different levels of context and encoding it as soft constraints during learning.
    Page 2, “Related Work”
  5. Therefore all the constraints are applied as soft constraints .
    Page 4, “Approach”
  6. The differences are: (l) we encode the coreference relations as soft constraints during learning instead of applying them as hard constraints during inference time; (2) our constraints can apply to both polar and non-polar sentences; (3) our identification of coreference relations is automatic without any fine-grained annotations for opinion targets.
    Page 5, “Approach”

See all papers in Proc. ACL 2014 that mention soft constraints.

See all papers in Proc. ACL that mention soft constraints.

Back to top.

F1 scores

Appears in 5 sentences as: F1 scores (5)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. Table 4: F1 scores for each sentiment category (positive, negative and neutral) for semi-supervised sentiment classification on the MD dataset
    Page 8, “Experiments”
  2. Table 4 shows the results in terms of F1 scores for each sentiment category (positive, negative and neutral).
    Page 8, “Experiments”
  3. We observe that the DOCORACLE baseline provides very strong F1 scores on the positive and negative categories especially in the Books and Music domains, but very poor F1 on the neutral category.
    Page 8, “Experiments”
  4. In contrast, our PR models provide more balanced F1 scores among all the sentiment categories.
    Page 8, “Experiments”
  5. Compared to the CRF baseline and its variants, we found that the PR models can greatly improve the precision of predicting positive and negative sentences, resulting in a significant improvement on the pos-itive/negative F1 scores .
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention F1 scores.

See all papers in Proc. ACL that mention F1 scores.

Back to top.

unlabeled data

Appears in 5 sentences as: unlabeled data (5)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. In this paper, we propose a sentence-level sentiment classification method that can (1) incorporate rich discourse information at both local and global levels; (2) encode discourse knowledge as soft constraints during learning; (3) make use of unlabeled data to enhance learning.
    Page 2, “Introduction”
  2. In the supervised setting, we treated the test data as unlabeled data and performed transductive learning.
    Page 6, “Experiments”
  3. In the semi-supervised setting, our unlabeled data consists of
    Page 6, “Experiments”
  4. both the available unlabeled data and the test data.
    Page 7, “Experiments”
  5. In contrast, the PR model is able to associate stronger sentiment signals to these features by leveraging unlabeled data for indirect supervision.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

fine-grained

Appears in 4 sentences as: fine-grained (4)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. Accordingly, extracting sentiment at the fine-grained level (e. g. at the sentence- or phrase-level) has received increasing attention recently due to its challenging nature and its importance in supporting these opinion analysis tasks (Pang and Lee, 2008).
    Page 1, “Introduction”
  2. However, the discourse relations were obtained from fine-grained annotations and implemented as hard constraints on polarity.
    Page 2, “Introduction”
  3. Obtaining sentiment labels at the fine-grained level is costly.
    Page 2, “Introduction”
  4. The differences are: (l) we encode the coreference relations as soft constraints during learning instead of applying them as hard constraints during inference time; (2) our constraints can apply to both polar and non-polar sentences; (3) our identification of coreference relations is automatic without any fine-grained annotations for opinion targets.
    Page 5, “Approach”

See all papers in Proc. ACL 2014 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

dependency paths

Appears in 3 sentences as: dependency paths (3)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. The syntactic rules correspond to the shortest dependency paths between an opinion word and an extracted mention.
    Page 5, “Approach”
  2. We consider the 10 most frequent dependency paths in the training data.
    Page 5, “Approach”
  3. Example dependency paths include nsubj(opinion, mention), nobj(opinion, mention), and am0d(mention, opinion).
    Page 5, “Approach”

See all papers in Proc. ACL 2014 that mention dependency paths.

See all papers in Proc. ACL that mention dependency paths.

Back to top.

Gibbs Sampling

Appears in 3 sentences as: Gibbs sampler (1) Gibbs Sampling (1) Gibbs sampling (1)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. For constraints with higher-order structures, we use Gibbs Sampling (Geman and Geman, 1984) to approximate the expectations.
    Page 6, “Approach”
  2. For documents where the higher-order constraints apply, we use the same Gibbs sampler as described above to infer the most likely label assignment, otherwise, we use the Viterbi algorithm.
    Page 6, “Approach”
  3. For approximation inference with higher-order constraints, we perform 2000 Gibbs sampling iterations where the first 1000 iterations are bum-in iterations.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention Gibbs Sampling.

See all papers in Proc. ACL that mention Gibbs Sampling.

Back to top.

machine learning

Appears in 3 sentences as: machine learning (3)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. Most existing machine learning approaches suffer from limitations in the modeling of complex linguistic structures across sentences and often fail to capture nonlocal contextual cues that are important for sentiment interpretation.
    Page 1, “Abstract”
  2. machine learning algorithms with rich features and take into account the interactions between words to handle compositional effects such as polarity reversal (e.g.
    Page 1, “Introduction”
  3. Existing machine learning approaches for the task can be classified based on the use of two ideas.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

significantly outperform

Appears in 3 sentences as: significantly outperform (2) significantly outperforms (1)
In Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
  1. that PR significantly outperforms all other baselines in both the CR dataset and the MD dataset (average accuracy across domains is reported).
    Page 8, “Experiments”
  2. In contrast, both PR1“ and PR significantly outperform CRF, which implies that incorporating lexical and discourse constraints as posterior constraints is much more effective.
    Page 8, “Experiments”
  3. We can see that both PR and Ple significantly outperform all other baselines in all domains.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention significantly outperform.

See all papers in Proc. ACL that mention significantly outperform.

Back to top.