Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan

Article Structure

Abstract

Extracting sentiment and topic lexicons is important for opinion mining.

Introduction

In the past few years, opinion mining and sentiment analysis have attracted much attention in Natural Language Processing (NLP) and Information Retrieval (IR) (Pang and Lee, 2008; Liu, 2010).

Topics

labeled data

Appears in 16 sentences as: labeled data (20)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain.
    Page 1, “Abstract”
  2. In this paper, we focus on the co-extraction task of sentiment and topic lexicons in a target domain where we do not have any labeled data, but have plenty of labeled data in a source domain.
    Page 1, “Introduction”
  3. lize useful labeled data from the source domain as well as exploit the relationships between the topic and sentiment words to propagate information for lexicon construction in the target domain.
    Page 2, “Introduction”
  4. labeled data be available in the target domain.
    Page 2, “Introduction”
  5. Recall that, we focus on the setting where we have no labeled data in the target domain, while we have plenty of labeled data in the source domain.
    Page 2, “Introduction”
  6. Our proposed bootstrapping for cross-domain lexicon extraction is based on the following two observations: 1) Although the source and target domains are different, part of source domain labeled data is still useful for lexicon extraction in the target domain after some adaptation; 2) The syntactic relationships among sentiment and topic words can be used to expand the seeds in the target domain for lexicon construction.
    Page 4, “Introduction”
  7. The main idea of TrAdaBoost is to re-weight the source domain data based on a few of target domain labeled data , which is referred to as seeds in our task.
    Page 4, “Introduction”
  8. If ,u = 0, then RAP only utilizes useful source domain labeled data to assist learning of the target domain classifier without considering the relationships between sentiment and topic words.
    Page 6, “Introduction”
  9. The results of in-domain classifiers, which are trained on plenty of target domain labeled data , can be treated as upper-bounds.
    Page 6, “Introduction”
  10. Since this method requires some target domain labeled data , we manually label 30 sentiment words in the target domain.
    Page 6, “Introduction”
  11. TrAdaBoost We apply TrAdaBoost (Dai et al., 2007) on the source domain labeled data and the generated seeds in the target domain to train a lexicon extractor.
    Page 6, “Introduction”

See all papers in Proc. ACL 2012 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

sentiment lexicon

Appears in 16 sentences as: Sentiment lexicon (1) sentiment lexicon (11) sentiment lexicons (4)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. Sentiment lexicon construction and topic lexicon extraction are two fundamental subtasks for opinion mining (Qiu et al., 2009).
    Page 1, “Introduction”
  2. A sentiment lexicon is a list of sentiment expressions, which are used to indicate sentiment polarity (e.g., positive or negative).
    Page 1, “Introduction”
  3. The sentiment lexicon is domain dependent as users may use different sentiment words to express their opinion in different domains (e. g., different products).
    Page 1, “Introduction”
  4. Note that, similar to sentiment lexicons , different domains may have very different topic lexicons.
    Page 1, “Introduction”
  5. sentiment lexicons , respectively.
    Page 3, “Introduction”
  6. After generating the topic and sentiment seeds, we aim to expand them in the target domain to construct topic and sentiment lexicons .
    Page 4, “Introduction”
  7. From Table 2, we can observe that our proposed methods are effective for sentiment lexicon extraction.
    Page 6, “Introduction”
  8. In addition, we also observe that embedding the TrAdaBoost algorithm into a bootstrapping process can further boost the performance of the classifier for sentiment lexicon extraction.
    Page 6, “Introduction”
  9. From the table, we can observe that different from the sentiment lexicon extraction task, the relational bootstrapping method performs better than the adaptive bootstrapping method slightly.
    Page 6, “Introduction”
  10. The reason may be that for the sentiment lexicon extraction task, there exist some common sentiment words
    Page 6, “Introduction”
  11. Table 2: Results on sentiment lexicon extraction.
    Page 7, “Introduction”

See all papers in Proc. ACL 2012 that mention sentiment lexicon.

See all papers in Proc. ACL that mention sentiment lexicon.

Back to top.

sentiment classification

Appears in 10 sentences as: Sentiment Classification (1) Sentiment classification (1) sentiment classification (8)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. However, most of them focused on coarse-grained document-level sentiment classification , which is different from our fine-grained word-level extraction.
    Page 2, “Introduction”
  2. 7 Application: Sentiment Classification
    Page 8, “Introduction”
  3. To further verify the usefulness of the lexicons extracted by the RAP method, we apply the extracted sentiment lexicon for sentiment classification .
    Page 8, “Introduction”
  4. Our work is motivated by the work of (Pang and Lee, 2004), which only used subjective sentences for document-level sentiment classification , instead of using all sentences.
    Page 8, “Introduction”
  5. Our goal is compare the sentiment lexicon constructed by the RAP method with other general lexicons on the impact of for sentiment classification .
    Page 8, “Introduction”
  6. We use the dataset from (Blitzer et al., 2007) for sentiment classification .
    Page 8, “Introduction”
  7. Experimental results on sentiment classification are shown in Table 5, where we denote “All” using all unigram and bigram features instead of using subjective words.
    Page 8, “Introduction”
  8. These promising results imply that our RAP can be applied for sentiment classification effectively and efficiently.
    Page 8, “Introduction”
  9. Table 5: Sentiment classification results (accuracy in %).
    Page 8, “Introduction”
  10. Furthermore, the extracted sentiment lexicon can be applied to sentiment classification effectively.
    Page 8, “Introduction”

See all papers in Proc. ACL 2012 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

domain adaptation

Appears in 8 sentences as: Domain Adaptation (1) Domain adaptation (1) domain adaptation (6)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain.
    Page 1, “Abstract”
  2. Experimental results show that our domain adaptation framework can extract precise lexicons in the target domain without any annotation.
    Page 1, “Abstract”
  3. To address this problem, we propose a two-stage domain adaptation method.
    Page 1, “Introduction”
  4. While, most of previous work focused on document level; 2) A new two-step domain adaptation framework, with a novel RAP algorithm for seed expansion, is proposed.
    Page 2, “Introduction”
  5. 2.2 Domain Adaptation
    Page 2, “Introduction”
  6. Domain adaptation aims at transferring knowledge across domains where data distributions may be different (Pan and Yang, 2010).
    Page 2, “Introduction”
  7. In the past few years, domain adaptation techniques have been widely applied to various NLP tasks, such as part-of-speech tagging (Ando and Zhang, 2005; Jiang and Zhai, 2007; Daumé III, 2007), named-entity recognition and shallow parsing (Daume III, 2007; Jiang and Zhai, 2007; Wu et al., 2009).
    Page 2, “Introduction”
  8. In the following sections, we present the proposed two-stage domain adaptation framework: 1) generating some sentiment and topic seeds in the target domain; and 2) expanding the seeds in the target domain to construct sentiment and topic lexicons.
    Page 3, “Introduction”

See all papers in Proc. ACL 2012 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

iteratively

Appears in 7 sentences as: iteratively (7)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. After new topic words are extracted in the movie domain, we can apply the same syntactic pattern or other syntactic patterns to extract new sentiment and topic words iteratively .
    Page 3, “Introduction”
  2. Bootstrapping is the process of improving the performance of a weak classifier by iteratively adding training data and retraining the classifier.
    Page 4, “Introduction”
  3. More specifically, bootstrapping starts with a small set of labeled “seeds”, and iteratively adds unlabeled
    Page 4, “Introduction”
  4. One important issue in bootstrapping is how to design a criterion to select unlabeled data to be added to the training set iteratively .
    Page 4, “Introduction”
  5. (5) and (6) iteratively .
    Page 5, “Introduction”
  6. We use the following reinforcement formulas to iteratively update the final sentiment score 51 (wTj) and
    Page 5, “Introduction”
  7. (5) and (6), the sentiment scores and topic scores are iteratively refined until the state of the graph trends to be stable.
    Page 6, “Introduction”

See all papers in Proc. ACL 2012 that mention iteratively.

See all papers in Proc. ACL that mention iteratively.

Back to top.

CRF

Appears in 4 sentences as: CRF (5)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. Our work is similar to Jakob and Gurevych (2010) which proposed a Conditional Random Field ( CRF ) for cross-domain topic word extraction.
    Page 2, “Introduction”
  2. We denote iSVM and iCRF the in-domain SVM and CRF classifiers in experiments, and compare our proposed methods,
    Page 6, “Introduction”
  3. Cross-Domain CRF (Cross-CRF) we implement a cross-domain CRF algorithm proposed by (Jakob and Gurevych, 2010).
    Page 6, “Introduction”
  4. The relational bootstrapping method performs better than the unsupervised method, TrAdaBoost and the cross-domain CRF algorithm, and achieves comparable results with the semi-supervised method.
    Page 6, “Introduction”

See all papers in Proc. ACL 2012 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

dependency tree

Appears in 4 sentences as: dependency tree (3) dependency trees (1)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. Figure 1 shows two dependency trees for the sentence “the camera is great” in the camera domain and the sentence “the movie is excellent” in the movie domain, respectively.
    Page 3, “Introduction”
  2. Figure 1: Examples of dependency tree structure.
    Page 3, “Introduction”
  3. More specifically, we use the shortest path between a topic word and a sentiment word in the corresponding dependency tree to denote the relation between them.
    Page 3, “Introduction”
  4. As an example shown in Figure 2, we can extract two paths or relationships between topic and sentiment words from the dependency tree of the sentence “The movie has good script”: “NN-amod-JJ” from “script” and “good”, and “NN-nsubj-VB-dobj-NN-amod-JJ” from “movie” and “good”.
    Page 3, “Introduction”

See all papers in Proc. ACL 2012 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

semi-supervised

Appears in 4 sentences as: Semi-Supervised (1) semi-supervised (3)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. (2009) proposed a rule-based semi-supervised learning methods for lexicon extraction.
    Page 2, “Introduction”
  2. Semi-Supervised Method (Semi) we implement the double propagation model proposed in (Qiu et al., 2009).
    Page 6, “Introduction”
  3. The relational bootstrapping method performs better than the unsupervised method, TrAdaBoost and the cross-domain CRF algorithm, and achieves comparable results with the semi-supervised method.
    Page 6, “Introduction”
  4. However, compared to the semi-supervised method, our proposed relational bootstrapping method does not require any labeled data in the target domain.
    Page 6, “Introduction”

See all papers in Proc. ACL 2012 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

manually annotated

Appears in 3 sentences as: manually annotate (1) manually annotated (2)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. However, the performance of these methods highly relies on manually annotated training data.
    Page 1, “Introduction”
  2. However, these methods need to manually annotate a lot of training data in each domain.
    Page 2, “Introduction”
  3. The sentiment and topic words are manually annotated .
    Page 6, “Introduction”

See all papers in Proc. ACL 2012 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.

sentiment analysis

Appears in 3 sentences as: sentiment analysis (3)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. In the past few years, opinion mining and sentiment analysis have attracted much attention in Natural Language Processing (NLP) and Information Retrieval (IR) (Pang and Lee, 2008; Liu, 2010).
    Page 1, “Introduction”
  2. In summary, we have three main contributions: 1) We give a systematic study on cross-domain sentiment analysis in word level.
    Page 2, “Introduction”
  3. There are also lots of studies for cross-domain sentiment analysis (Blitzer et al., 2007; Tan et al., 2007; Li et al., 2009; Pan et al., 2010; Bollegala et al., 2011; He et al., 2011; Glorot et al., 2011).
    Page 2, “Introduction”

See all papers in Proc. ACL 2012 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

significant improvement

Appears in 3 sentences as: significant improvement (3)
In Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
  1. Numbers in boldface denote significant improvement .
    Page 7, “Introduction”
  2. Numbers in boldface denote significant improvement .
    Page 7, “Introduction”
  3. Numbers in boldface denotes significant improvement .
    Page 8, “Introduction”

See all papers in Proc. ACL 2012 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.