Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
Li, Shoushan and Huang, Chu-Ren and Zhou, Guodong and Lee, Sophia Yat Mei

Article Structure

Abstract

In this paper, we adopt two views, personal and impersonal views, and systematically employ them in both supervised and semi-supervised sentiment classification.

Introduction

As a special task of text classification, sentiment classification aims to classify a text according to the expressed sentimental polarities of opinions such as ‘thumb up’ or ‘thumb down’ on the movies (Pang et al., 2002).

Related Work

Recently, a variety of studies have been reported on sentiment classification at different levels: word level (Esuli and Sebastiani, 2005), phrase level (Wilson et al., 2009), sentence level (Kim and Hovy, 2004; Liu et al., 2005), and document level (Turney, 2002; Pang et al., 2002).

Unsupervised Mining of Personal and Impersonal Views

As mentioned in Section 1, the objective of sentiment classification is to classify a specific binary relation: X ’s evaluation on Y, where X is an object set including different kinds of persons and Y is another object set including the target objects to be evaluated.

Topics

sentiment classification

Appears in 43 sentences as: Sentiment Classification (3) sentiment classification (39) sentiment classifications (1) sentiment classifier (1)
In Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
  1. In this paper, we adopt two views, personal and impersonal views, and systematically employ them in both supervised and semi-supervised sentiment classification .
    Page 1, “Abstract”
  2. On this basis, an ensemble method and a co—training algorithm are explored to employ the two views in supervised and semi-supervised sentiment classification respectively.
    Page 1, “Abstract”
  3. As a special task of text classification, sentiment classification aims to classify a text according to the expressed sentimental polarities of opinions such as ‘thumb up’ or ‘thumb down’ on the movies (Pang et al., 2002).
    Page 1, “Introduction”
  4. In general, the objective of sentiment classification can be represented as a kind of binary relation R, defined as an ordered triple (X, Y, G), where X is an object set including different kinds of people (e. g. writers, reviewers, or users), Y is another object set including the target objects (e.g.
    Page 1, “Introduction”
  5. The concerned relation in sentiment classification is X ’s evaluation on Y, such as ‘thumb up’, ‘thumb down’, ‘favorable’,
    Page 1, “Introduction”
  6. This kind of information is normally domain-independent and serves as highly relevant clues to sentiment classification .
    Page 1, “Introduction”
  7. It is well-known that sentiment classification is very domain-specific (Blitzer et al., 2007), so it is critical to eliminate its dependence on a large-scale labeled data for its wide applications.
    Page 1, “Introduction”
  8. Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time.
    Page 1, “Introduction”
  9. In this paper, we systematically employ personal/impersonal views in supervised and semi-supervised sentiment classification .
    Page 2, “Introduction”
  10. Then, both views are employed in supervised sentiment classification via an ensemble of individual classifiers generated by each view.
    Page 2, “Introduction”
  11. Finally, a co-training algorithm is proposed to incorporate unlabeled data for semi-supervised sentiment classification .
    Page 2, “Introduction”

See all papers in Proc. ACL 2010 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

semi-supervised

Appears in 30 sentences as: Semi-Supervised (1) Semi-supervised (4) semi-supervised (26)
In Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
  1. In this paper, we adopt two views, personal and impersonal views, and systematically employ them in both supervised and semi-supervised sentiment classification.
    Page 1, “Abstract”
  2. On this basis, an ensemble method and a co—training algorithm are explored to employ the two views in supervised and semi-supervised sentiment classification respectively.
    Page 1, “Abstract”
  3. Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time.
    Page 1, “Introduction”
  4. Therefore, given the two different views mentioned above, one promising application is to adopt them in co-training algorithms, which has been proven to be an effective semi-supervised learning strategy of incorporating unlabeled data to further improve the classification performance (Zhu, 2005).
    Page 1, “Introduction”
  5. In this paper, we systematically employ personal/impersonal views in supervised and semi-supervised sentiment classification.
    Page 2, “Introduction”
  6. Finally, a co-training algorithm is proposed to incorporate unlabeled data for semi-supervised sentiment classification.
    Page 2, “Introduction”
  7. Section 4 and Section 5 propose our supervised and semi-supervised methods on sentiment classification respectively.
    Page 2, “Introduction”
  8. Generally, document-level sentiment classification methods can be categorized into three types: unsupervised, supervised, and semi-supervised .
    Page 2, “Related Work”
  9. Semi-supervised methods combine unlabeled data with labeled training data (often small-scaled) to improve the models.
    Page 2, “Related Work”
  10. Compared to the supervised and unsupervised methods, semi-supervised methods for sentiment classification are relatively new and have much less related studies.
    Page 2, “Related Work”
  11. Dasgupta and Ng (2009) integrate various methods in semi-supervised sentiment classification including spectral clustering, active learning, transductive learning, and ensemble learning.
    Page 2, “Related Work”

See all papers in Proc. ACL 2010 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

labeled data

Appears in 13 sentences as: labeled data (14)
In Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
  1. It is well-known that sentiment classification is very domain-specific (Blitzer et al., 2007), so it is critical to eliminate its dependence on a large-scale labeled data for its wide applications.
    Page 1, “Introduction”
  2. Supervised methods consider sentiment classification as a standard classification problem in which labeled data in a domain are used to train a domain-specific classifier.
    Page 2, “Related Work”
  3. The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998).
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  4. Input: The labeled data L containing personal sentence set Sbmmwl and impersonal sentence set
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  5. U — persona SU —impersonal Output: New labeled data L Procedure:
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  6. After obtaining the new labeled data , we can either adopt one classifier (i.e.
    Page 5, “Unsupervised Mining of Personal and Impersonal Views”
  7. Figure 3 gives the distribution of personal and impersonal sentences in the training data (75% labeled data of all data).
    Page 5, “Unsupervised Mining of Personal and Impersonal Views”
  8. the new labeled data to test on the testing data while the latter applies the combined classifier f1 + f2 + f3.
    Page 6, “Unsupervised Mining of Personal and Impersonal Views”
  9. Figure 5 shows the classification results of different methods with different sizes of the labeled data : 5%, 10%, and 15% of all data, where the testing data are kept the same (20% of all data).
    Page 7, “Unsupervised Mining of Personal and Impersonal Views”
  10. Specifically, the results of other methods including self-training, transductive SVM, and random views are presented when 10% labeled data are used in training.
    Page 7, “Unsupervised Mining of Personal and Impersonal Views”
  11. Book DVD Electronic Kitchen Using 5% labeled data as training data 0 867K 0-85 ””””” ’3;85’5""0’74 ””” " 0.75 i i i i i i i ’07’0’6 iiii.
    Page 8, “Unsupervised Mining of Personal and Impersonal Views”

See all papers in Proc. ACL 2010 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

unlabeled data

Appears in 13 sentences as: unlabeled data (13)
In Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
  1. Since the unlabeled data is ample and easy to collect, a successful semi-supervised sentiment classification system would significantly minimize the involvement of labor and time.
    Page 1, “Introduction”
  2. Therefore, given the two different views mentioned above, one promising application is to adopt them in co-training algorithms, which has been proven to be an effective semi-supervised learning strategy of incorporating unlabeled data to further improve the classification performance (Zhu, 2005).
    Page 1, “Introduction”
  3. Finally, a co-training algorithm is proposed to incorporate unlabeled data for semi-supervised sentiment classification.
    Page 2, “Introduction”
  4. Semi-supervised methods combine unlabeled data with labeled training data (often small-scaled) to improve the models.
    Page 2, “Related Work”
  5. Semi-supervised learning is a strategy which combines unlabeled data with labeled training data to improve the models.
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  6. The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998).
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  7. L—impersonal The unlabeled data U containing personal sentence set S l and impersonal sentence set
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  8. Self-training, which uses the unlabeled data in a bootstrapping way like co-training yet limits the number of classifiers and the number of views to one.
    Page 6, “Unsupervised Mining of Personal and Impersonal Views”
  9. Transductive SVM, which seeks the largest separation between labeled and unlabeled data through regularization (Joachims, 1999).
    Page 6, “Unsupervised Mining of Personal and Impersonal Views”
  10. In semi-supervised sentiment classification, the data are randomly partitioned into labeled training data, unlabeled data , and testing data with the proportion of 10%, 70% and 20% respectively.
    Page 6, “Unsupervised Mining of Personal and Impersonal Views”
  11. One open question is whether the unlabeled data improve the performance.
    Page 7, “Unsupervised Mining of Personal and Impersonal Views”

See all papers in Proc. ACL 2010 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

SVM

Appears in 6 sentences as: SVM (6)
In Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
  1. We apply both support vector machine ( SVM ) and Maximum Entropy (ME) algorithms with the help of the SVM-light4 and Mallet5 tools.
    Page 5, “Unsupervised Mining of Personal and Impersonal Views”
  2. We find that ME performs slightly better than SVM on the average.
    Page 5, “Unsupervised Mining of Personal and Impersonal Views”
  3. Transductive SVM , which seeks the largest separation between labeled and unlabeled data through regularization (Joachims, 1999).
    Page 6, “Unsupervised Mining of Personal and Impersonal Views”
  4. self-training, transductive SVM ) are not shown in Figure 4 but will be reported in Figure 5 later.
    Page 6, “Unsupervised Mining of Personal and Impersonal Views”
  5. Specifically, the results of other methods including self-training, transductive SVM , and random views are presented when 10% labeled data are used in training.
    Page 7, “Unsupervised Mining of Personal and Impersonal Views”
  6. Transductive SVM performs even worse and can only improve the performance of the “software” domain.
    Page 7, “Unsupervised Mining of Personal and Impersonal Views”

See all papers in Proc. ACL 2010 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

cross validation

Appears in 3 sentences as: cross validation (3)
In Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification
  1. xmm =< P(x'l=1),P(x'l=2),P(x'l=3) > In our experiments, we perform stacking with 4-fold cross validation to generate meta-training data where each fold is used as the development data and the other three folds are used to train the base classifiers in the training phase.
    Page 4, “Unsupervised Mining of Personal and Impersonal Views”
  2. 4-fold cross validation is performed for supervised sentiment classification.
    Page 5, “Unsupervised Mining of Personal and Impersonal Views”
  3. Also, we find that our performances are similar to the ones (described as fully supervised results) reported in Dasgupta and Ng (2009) where the same data in the four domains are used and 10-fold cross validation is performed.
    Page 5, “Unsupervised Mining of Personal and Impersonal Views”

See all papers in Proc. ACL 2010 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.