Co-Training for Cross-Lingual Sentiment Classification
Wan, Xiaojun

Article Structure

Abstract

The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification.

Introduction

Sentiment classification is the task of identifying the sentiment polarity of a given text.

Related Work 2.1 Sentiment Classification

Sentiment classification can be performed on words, sentences or documents.

The Co-Training Approach

3.1 Overview

Empirical Evaluation 4.1 Evaluation Setup

4.1.1 Data set

Conclusion and Future Work

In this paper, we propose to use the co-training approach to address the problem of cross-lingual sentiment classification.

Topics

sentiment classification

Appears in 33 sentences as: Sentiment classification (2) sentiment classification (30) sentiment classifier (3) sentiment classifiers (1)
In Co-Training for Cross-Lingual Sentiment Classification
  1. The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification .
    Page 1, “Abstract”
  2. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data.
    Page 1, “Abstract”
  3. Sentiment classification is the task of identifying the sentiment polarity of a given text.
    Page 1, “Introduction”
  4. In recent years, sentiment classification has drawn much attention in the NLP field and it has many useful applications, such as opinion mining and summarization (Liu et al., 2005; Ku et al., 2006; Titov and McDonald, 2008).
    Page 1, “Introduction”
  5. To date, a variety of corpus-based methods have been developed for sentiment classification .
    Page 1, “Introduction”
  6. The methods usually rely heavily on an annotated corpus for training the sentiment classifier .
    Page 1, “Introduction”
  7. The sentiment corpora are considered as the most valuable resources for the sentiment classification task.
    Page 1, “Introduction”
  8. Because most previous work focuses on English sentiment classification, many annotated corpora for English sentiment classification are freely available on the Web.
    Page 1, “Introduction”
  9. Chinese sentiment classification are scarce and it is not a trivial task to manually label reliable Chinese sentiment corpora.
    Page 1, “Introduction”
  10. The challenge before us is how to leverage rich English corpora for Chinese sentiment classification .
    Page 1, “Introduction”
  11. In this study, we focus on the problem of cross-lingual sentiment classification, which leverages only English training data for supervised sentiment classification of Chinese product reviews, without using any Chinese resources.
    Page 1, “Introduction”

See all papers in Proc. ACL 2009 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

SVM

Appears in 19 sentences as: SVM (24)
In Co-Training for Cross-Lingual Sentiment Classification
  1. SVM , NB), and the classification performance is far from satisfactory because of the language gap between the original language and the translated language.
    Page 1, “Introduction”
  2. The SVM classifier is adopted as the basic classifier in the proposed approach.
    Page 1, “Introduction”
  3. Standard Na'1've Bayes and SVM classifiers have been applied for subjectivity classification in Romanian (Mihalcea et al., 2007; Banea et al., 2008), and the results show that automatic translation is a viable alternative for the construction of resources and tools for subjectivity analysis in a new target language.
    Page 2, “Related Work 2.1 Sentiment Classification”
  4. To date, many semi-supervised learning algorithms have been developed for addressing the cross-domain text classification problem by transferring knowledge across domains, including Transductive SVM (Joachims, 1999), EM(Nigam et al., 2000), EM-based Na'1've Bayes classifier (Dai et al., 2007a), Topic-bridged PLSA (Xue et al., 2008), Co-Clustering based classification (Dai et al., 2007b), two-stage approach (Jiang and Zhai, 2007).
    Page 2, “Related Work 2.1 Sentiment Classification”
  5. Typical text classifiers include Support Vector Machine ( SVM ), Na'1've Bayes (NB), Maximum Entropy (ME), K-Nearest Neighbor (KNN), etc.
    Page 4, “The Co-Training Approach”
  6. In this study, we adopt the widely-used SVM classifier (Joachims, 2002).
    Page 4, “The Co-Training Approach”
  7. as two sets of vectors in a feature space, SVM constructs a separating hyperplane in the space by maximizing the margin between the two data sets.
    Page 4, “The Co-Training Approach”
  8. The output value of the SVM classifier for a review indicates the confidence level of the review’s classification.
    Page 4, “The Co-Training Approach”
  9. SVM(CN): This method applies the inductive SVM with only Chinese features for sentiment classification in the Chinese view.
    Page 5, “Empirical Evaluation 4.1 Evaluation Setup”
  10. SVM(EN): This method applies the inductive SVM with only English features for sentiment classification in the English view.
    Page 5, “Empirical Evaluation 4.1 Evaluation Setup”
  11. SVM(ENCNI): This method applies the inductive SVM with both English and Chinese features for sentiment classification in the two views.
    Page 5, “Empirical Evaluation 4.1 Evaluation Setup”

See all papers in Proc. ACL 2009 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

cross-lingual

Appears in 8 sentences as: cross-lingual (8)
In Co-Training for Cross-Lingual Sentiment Classification
  1. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data.
    Page 1, “Abstract”
  2. In this study, we focus on the problem of cross-lingual sentiment classification, which leverages only English training data for supervised sentiment classification of Chinese product reviews, without using any Chinese resources.
    Page 1, “Introduction”
  3. In this study, we focus on improving the corpus-based method for cross-lingual sentiment classification of Chinese product reviews by developing novel approaches.
    Page 2, “Related Work 2.1 Sentiment Classification”
  4. Cross-domain text classification can be considered as a more general task than cross-lingual sentiment classification.
    Page 2, “Related Work 2.1 Sentiment Classification”
  5. In particular, several previous studies focus on the problem of cross-lingual text classification, which can be considered as a special case of general cross-domain text classification.
    Page 2, “Related Work 2.1 Sentiment Classification”
  6. To the best of our knowledge, co-training has not yet been investigated for cross-domain or cross-lingual text classification.
    Page 2, “Related Work 2.1 Sentiment Classification”
  7. In the context of cross-lingual sentiment classification, each labeled English review or unlabeled Chinese review has two views of features: English features and Chinese features.
    Page 4, “The Co-Training Approach”
  8. In this paper, we propose to use the co-training approach to address the problem of cross-lingual sentiment classification.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2009 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

machine translation

Appears in 8 sentences as: Machine translation (1) machine translation (7)
In Co-Training for Cross-Lingual Sentiment Classification
  1. Machine translation services are used for eliminating the language gap between the training set and test set, and English features and Chinese features are considered as two independent views of the classification problem.
    Page 1, “Abstract”
  2. First, machine translation services are used to translate English training reviews into Chinese reviews and also translate Chinese test reviews and additional unlabeled reviews into English reviews.
    Page 1, “Introduction”
  3. (2004) use the technique of deep language analysis for machine translation to extract sentiment units in text documents.
    Page 2, “Related Work 2.1 Sentiment Classification”
  4. The labeled English reviews are translated into labeled Chinese reviews, and the unlabeled Chinese reviews are translated into unlabeled English reviews, by using machine translation services.
    Page 3, “The Co-Training Approach”
  5. Fortunately, machine translation techniques have been well developed in the NLP field, though the translation performance is far from satisfactory.
    Page 3, “The Co-Training Approach”
  6. A few commercial machine translation services can be publicly accessed, e.g.
    Page 3, “The Co-Training Approach”
  7. In this study, we adopt Google Translate for both English-to-Chinese Translation and Chinese-to-English Translation, because it is one of the state-of-the-art commercial machine translation systems used today.
    Page 4, “The Co-Training Approach”
  8. 2) The feature distributions of the translated text and the natural text in the same language are still different due to the inaccuracy of the machine translation service.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2009 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

text classification

Appears in 7 sentences as: Text Classification (1) text classification (6) text classifiers (1)
In Co-Training for Cross-Lingual Sentiment Classification
  1. 2.2 Cross-Domain Text Classification
    Page 2, “Related Work 2.1 Sentiment Classification”
  2. Cross-domain text classification can be considered as a more general task than cross-lingual sentiment classification.
    Page 2, “Related Work 2.1 Sentiment Classification”
  3. In the problem of cross-domain text classification , the labeled and unlabeled data come from different domains, and their underlying distributions are often different from each other, which violates the basic assumption of traditional classification learning.
    Page 2, “Related Work 2.1 Sentiment Classification”
  4. To date, many semi-supervised learning algorithms have been developed for addressing the cross-domain text classification problem by transferring knowledge across domains, including Transductive SVM (Joachims, 1999), EM(Nigam et al., 2000), EM-based Na'1've Bayes classifier (Dai et al., 2007a), Topic-bridged PLSA (Xue et al., 2008), Co-Clustering based classification (Dai et al., 2007b), two-stage approach (Jiang and Zhai, 2007).
    Page 2, “Related Work 2.1 Sentiment Classification”
  5. In particular, several previous studies focus on the problem of cross-lingual text classification, which can be considered as a special case of general cross-domain text classification .
    Page 2, “Related Work 2.1 Sentiment Classification”
  6. To the best of our knowledge, co-training has not yet been investigated for cross-domain or cross-lingual text classification .
    Page 2, “Related Work 2.1 Sentiment Classification”
  7. Typical text classifiers include Support Vector Machine (SVM), Na'1've Bayes (NB), Maximum Entropy (ME), K-Nearest Neighbor (KNN), etc.
    Page 4, “The Co-Training Approach”

See all papers in Proc. ACL 2009 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

sentiment analysis

Appears in 5 sentences as: sentiment analysis (5)
In Co-Training for Cross-Lingual Sentiment Classification
  1. Note that the above problem is not only defined for Chinese sentiment classification, but also for various sentiment analysis tasks in other different languages.
    Page 1, “Introduction”
  2. Corpus-based methods usually consider the sentiment analysis task as a classification task and they use a labeled corpus to train a sentiment classifier.
    Page 2, “Related Work 2.1 Sentiment Classification”
  3. Chinese sentiment analysis has also been studied (Tsou et al., 2005; Ye et al., 2006; Li and Sun, 2007) and most such work uses similar lexicon-
    Page 2, “Related Work 2.1 Sentiment Classification”
  4. To date, several pilot studies have been performed to leverage rich English resources for sentiment analysis in other languages.
    Page 2, “Related Work 2.1 Sentiment Classification”
  5. Wan (2008) focuses on leveraging both Chinese and English lexicons to improve Chinese sentiment analysis by using lexicon-based methods.
    Page 2, “Related Work 2.1 Sentiment Classification”

See all papers in Proc. ACL 2009 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

unigram

Appears in 3 sentences as: unigram (2) unigrams (1)
In Co-Training for Cross-Lingual Sentiment Classification
  1. The English or Chinese features used in this study include both unigrams and bigrams5 and the feature weight is simply set to term frequency6.
    Page 4, “The Co-Training Approach”
  2. 5 For Chinese text, a unigram refers to a Chinese word and a bigram refers to two adjacent Chinese words.
    Page 4, “The Co-Training Approach”
  3. In the above experiments, all features ( unigram + bigram) are used.
    Page 8, “Empirical Evaluation 4.1 Evaluation Setup”

See all papers in Proc. ACL 2009 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.