Cross-Lingual Mixture Model for Sentiment Classification
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

Article Structure

Abstract

The amount of labeled sentiment data in English is much larger than that in other languages.

Introduction

Sentiment Analysis (also known as opinion mining), which aims to extract the sentiment information from text, has attracted extensive attention in recent years.

Related Work

In this section, we present a brief review of the related work on monolingual sentiment classification and cross-lingual sentiment classification.

Cross-Lingual Mixture Model for Sentiment Classification

In this section we present the cross-lingual mixture model (CLMM) for sentiment classification.

Experiment

4.1 Experiment Setup and Data Sets

Conclusion and Future Work

In this paper, we propose a cross-lingual mixture model (CLMM) to tackle the problem of cross-lingual sentiment classification.

Topics

labeled data

Appears in 58 sentences as: Labeled Data (5) Labeled data (1) labeled data (62)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. Chinese) using labeled data in the source language (e.g.
    Page 1, “Abstract”
  2. Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
    Page 1, “Abstract”
  3. Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
    Page 1, “Abstract”
  4. Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages.
    Page 1, “Introduction”
  5. One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
    Page 1, “Introduction”
  6. First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data .
    Page 1, “Introduction”
  7. Instead of relying on the unreliable machine translated labeled data , CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language.
    Page 2, “Introduction”
  8. Besides, CLMM can improve the accuracy of cross-lingual sentiment classification consistently regardless of whether labeled data in the target language are present or not.
    Page 2, “Introduction”
  9. We evaluate the model on sentiment classification of Chinese using English labeled data .
    Page 2, “Introduction”
  10. The experiment results show that CLMM yields 71% in accuracy when no Chinese labeled data are used, which significantly improves Chinese sentiment classification and is superior to the SVM and co-training based methods.
    Page 2, “Introduction”
  11. When Chinese labeled data are employed, CLMM yields 83% in accuracy, which is remarkably better than the SVM and achieve state-of-the-art performance.
    Page 2, “Introduction”

See all papers in Proc. ACL 2012 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

sentiment classification

Appears in 41 sentences as: Sentiment Classification (4) Sentiment classification (1) sentiment classification (33) sentiment classifier (6)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g.
    Page 1, “Abstract”
  2. Sentiment classification , the task of determining the sentiment orientation (positive, negative or neutral) of text, has been the most extensively studied task in sentiment analysis.
    Page 1, “Introduction”
  3. lel for Sentiment Classification
    Page 1, “Introduction”
  4. already a large amount of work on sentiment classification of text in various genres and in many languages.
    Page 1, “Introduction”
  5. (2002) focus on sentiment classification of movie reviews in English, and Zagibalov and Carroll (2008) study the problem of classifying product reviews in Chinese.
    Page 1, “Introduction”
  6. During the past few years, NTCIR1 organized several pilot tasks for sentiment classification of news articles written in English, Chinese and Japanese (Seki et al., 2007; Seki et al., 2008).
    Page 1, “Introduction”
  7. For English sentiment classification , there are several labeled corpora available (Hu and Liu, 2004; Pang et al., 2002; Wiebe et al., 2005).
    Page 1, “Introduction”
  8. Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages.
    Page 1, “Introduction”
  9. Chinese), and then using the translated training data directly for the development of the sentiment classifier in the target language (Wan, 2009; Pan et al., 2011).
    Page 1, “Introduction”
  10. (2011) show that vocabulary coverage has a strong correlation with sentiment classification accuracy.
    Page 2, “Introduction”
  11. In this paper we propose a cross-lingual mixture model (CLMM) for cross-lingual sentiment classification .
    Page 2, “Introduction”

See all papers in Proc. ACL 2012 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

SVM

Appears in 19 sentences as: SVM (20)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. The experiment results show that CLMM yields 71% in accuracy when no Chinese labeled data are used, which significantly improves Chinese sentiment classification and is superior to the SVM and co-training based methods.
    Page 2, “Introduction”
  2. When Chinese labeled data are employed, CLMM yields 83% in accuracy, which is remarkably better than the SVM and achieve state-of-the-art performance.
    Page 2, “Introduction”
  3. (2002) compare the performance of three commonly used machine learning models (Naive Bayes, Maximum Entropy and SVM ).
    Page 2, “Related Work”
  4. Ga-mon (2004) shows that introducing deeper linguistic features into SVM can help to improve the performance.
    Page 2, “Related Work”
  5. English Labeled data are first translated to Chinese, and then two SVM classifiers are trained on English and Chinese labeled data respectively.
    Page 3, “Related Work”
  6. After that, co-training (Blum and Mitchell, 1998) approach is adopted to leverage Chinese unlabeled data and their English translation to improve the SVM classifier for Chinese sentiment classification.
    Page 3, “Related Work”
  7. MT—SVM: We translate the English labeled data to Chinese using Google Translate and use the translation results to train the SVM classifier for Chinese.
    Page 6, “Experiment”
  8. SVM: We train a SVM classifier on the Chinese labeled data.
    Page 6, “Experiment”
  9. First, two monolingual SVM classifiers are trained on English labeled data and Chinese data translated from English labeled data.
    Page 6, “Experiment”
  10. Third, the 100 most confidently predicted English and Chinese sentences are added to the training set and the two monolingual SVM classifiers are retrained on the expanded training set.
    Page 6, “Experiment”
  11. Method NTCIR-EN MPQA-EN NTCIR-CH NTCIR-CH MT—SVM 62.34 54.33 SVM N/A N/A MT—Cotrain 65.13 59.11 Para-Cotrain 67.21 60.71 J oint-Train N/A N/A CLMM 70.96 71.52
    Page 6, “Experiment”

See all papers in Proc. ACL 2012 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

machine translation

Appears in 18 sentences as: machine translated (5) machine translation (12) machine translations (1)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
    Page 1, “Abstract”
  2. This approach suffers from the limited coverage of vocabulary in the machine translation results.
    Page 1, “Abstract”
  3. One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
    Page 1, “Introduction”
  4. Second, machine translation may change the sentiment polarity of the original text.
    Page 2, “Introduction”
  5. Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language.
    Page 2, “Introduction”
  6. Most existing work relies on machine translation engines to directly adapt labeled data from the source language to target language.
    Page 3, “Related Work”
  7. Their bilingual view is also constructed by using machine translation engines to translate original documents.
    Page 3, “Related Work”
  8. Prettenhofer and Stein (2011) use machine translation engines in a different way.
    Page 3, “Related Work”
  9. Instead of using machine translation engines to translate labeled text, the authors use it to construct the word translation oracle for pivot words translation.
    Page 3, “Related Work”
  10. the authors use an unlabeled parallel corpus instead of machine translation engines.
    Page 3, “Related Work”
  11. Instead of using the corresponding machine translation of Chinese unlabeled sentences, we use the parallel English sentences of the Chinese unlabeled sentences.
    Page 6, “Experiment”

See all papers in Proc. ACL 2012 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

cross-lingual

Appears in 16 sentences as: Cross-Lingual (2) Cross-lingual (2) cross-lingual (14)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g.
    Page 1, “Abstract”
  2. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data.
    Page 1, “Abstract”
  3. In this paper we propose a cross-lingual mixture model (CLMM) for cross-lingual sentiment classification.
    Page 2, “Introduction”
  4. Besides, CLMM can improve the accuracy of cross-lingual sentiment classification consistently regardless of whether labeled data in the target language are present or not.
    Page 2, “Introduction”
  5. This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language.
    Page 2, “Introduction”
  6. We review related work in Section 2, and present the cross-lingual mixture model in Section 3.
    Page 2, “Introduction”
  7. In this section, we present a brief review of the related work on monolingual sentiment classification and cross-lingual sentiment classification.
    Page 2, “Related Work”
  8. 2.2 Cross-Lingual Sentiment Classification
    Page 2, “Related Work”
  9. Cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e. g. Chinese) with labeled data in the source
    Page 2, “Related Work”
  10. In this section we present the cross-lingual mixture model (CLMM) for sentiment classification.
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  11. We first formalize the task of cross-lingual sentiment classification.
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”

See all papers in Proc. ACL 2012 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

parallel corpus

Appears in 15 sentences as: parallel corpus (19)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. By “synchronizing” the generation of words in the source language and the target language in a parallel corpus, the proposed model can (1) improve vocabulary coverage by learning sentiment words from the unlabeled parallel corpus; (2) transfer polarity label information between the source language and target language using a parallel corpus .
    Page 2, “Introduction”
  2. the authors use an unlabeled parallel corpus instead of machine translation engines.
    Page 3, “Related Work”
  3. They propose a method of training two classifiers based on maximum entropy formulation to maximize their prediction agreement on the parallel corpus .
    Page 3, “Related Work”
  4. English), unlabeled parallel corpus U of the source language and the target language, and optional labeled data DT in target language T. Aligning with previous work (Wan, 2008; Wan, 2009), we only consider binary sentiment classification scheme (positive or negative) in this paper, but the proposed method can be used in other classification schemes with minor modifications.
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  5. The basic idea underlying CLMM is to enlarge the vocabulary by learning sentiment words from the parallel corpus .
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  6. More formally, CLMM defines a generative mixture model for generating a parallel corpus .
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  7. The an-observed polarities of the unlabeled parallel corpus are modeled as hidden variables, and the observed words in parallel corpus are modeled as generated by a set of words generation distributions conditioned on the hidden variables.
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  8. Given a parallel corpus, we fit CLMM model by maximizing the likelihood of generating this parallel corpus .
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  9. By maximizing the likelihood, CLMM can estimate words generation probabilities for words unseen in the labeled data but present in the parallel corpus , hence expand the vocabulary.
    Page 3, “Cross-Lingual Mixture Model for Sentiment Classification”
  10. Figure 1 illustrates the detailed process of generating words in the source language and target language respectively for the parallel corpus U, from the four mixture components in CLMM.
    Page 4, “Cross-Lingual Mixture Model for Sentiment Classification”
  11. Formally, we have the following log-likelihood function for a parallel corpus U 2.
    Page 4, “Cross-Lingual Mixture Model for Sentiment Classification”

See all papers in Proc. ACL 2012 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

parallel data

Appears in 12 sentences as: Parallel Data (1) parallel data (12)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data .
    Page 1, “Abstract”
  2. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage significantly.
    Page 1, “Abstract”
  3. Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language.
    Page 2, “Introduction”
  4. CLMM is a generative model that treats the source language and target language words in parallel data as generated simultaneously by a set of mixture components.
    Page 2, “Introduction”
  5. This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language.
    Page 2, “Introduction”
  6. CLMM includes two hyper-parameters (A3 and At) controlling the contribution of unlabeled parallel data .
    Page 6, “Experiment”
  7. 4.5 The Influence of Unlabeled Parallel Data
    Page 7, “Experiment”
  8. We investigate how the size of the unlabeled parallel data affects the sentiment classification in this subsection.
    Page 7, “Experiment”
  9. We use only English labeled data in this experiment, since this more directly reflects the effectiveness of each model in utilizing unlabeled parallel data .
    Page 7, “Experiment”
  10. From Figure 3 and Figure 4, we can see that when more unlabeled parallel data are added, the accuracy of CLMM consistently improves.
    Page 7, “Experiment”
  11. This further demonstrates the advantages of expanding vocabulary using bilingual parallel data .
    Page 8, “Experiment”

See all papers in Proc. ACL 2012 that mention parallel data.

See all papers in Proc. ACL that mention parallel data.

Back to top.

unlabeled data

Appears in 10 sentences as: unlabeled data (10)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. After that, co-training (Blum and Mitchell, 1998) approach is adopted to leverage Chinese unlabeled data and their English translation to improve the SVM classifier for Chinese sentiment classification.
    Page 3, “Related Work”
  2. where A and AWL) are weighting factor to control the influence of the unlabeled data .
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  3. We set A, (AND) to A, (At) when d, belongs to unlabeled data , 1 otherwise.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  4. When d, belongs to unlabeled data , P(cj is computed according to Equation 5 or 6.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  5. Larger weights indicate larger influence from the unlabeled data .
    Page 6, “Experiment”
  6. Second, the two classifiers make prediction on Chinese unlabeled data and their English translation,
    Page 6, “Experiment”
  7. Figure 3: Accuracy with different size of unlabeled data for NTICR-EN+NTCIR-CH
    Page 8, “Experiment”
  8. This indicates that our method leverages the unlabeled data effectively.
    Page 8, “Experiment”
  9. Figure 4: Accuracy with different size of unlabeled data for MPQA+NTCIR-CH
    Page 8, “Experiment”
  10. First, the proposed model can learn previously unseen sentiment words from large unlabeled data , which are not covered by the limited vocabulary in machine translation of the labeled data.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

parallel sentences

Appears in 8 sentences as: parallel sentences (8)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. They assume parallel sentences in the corpus should have the same sentiment polarity.
    Page 3, “Related Work”
  2. termining polarity classes of the parallel sentences .
    Page 4, “Cross-Lingual Mixture Model for Sentiment Classification”
  3. Particularly, for each pair of parallel sentences U: E U, we generate the words as follows.
    Page 4, “Cross-Lingual Mixture Model for Sentiment Classification”
  4. class label for unlabeled parallel sentences ) is computed according to the following equations.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  5. The unlabeled parallel sentences
    Page 5, “Experiment”
  6. This model use English labeled data and Chinese labeled data to obtain initial parameters for two maximum entropy classifiers (for English documents and Chinese documents), and then conduct EM-iterations to update the parameters to gradually improve the agreement of the two monolingual classifiers on the unlabeled parallel sentences .
    Page 6, “Experiment”
  7. When we have 10,000 parallel sentences , the accuracy of CLMM on the two data sets quickly increases to 68.77% and 68.91%, respectively.
    Page 7, “Experiment”
  8. In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis, such as building multilingual sentiment lexicons.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention parallel sentences.

See all papers in Proc. ACL that mention parallel sentences.

Back to top.

word alignment

Appears in 7 sentences as: (1) word alignment (5) word alignments (1)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. We estimate word projection probability using word alignment probability generated by the Berkeley aligner (Liang et al., 2006).
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  2. The word alignment probabilities serves two purposes.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  3. Figure 2 gives an example of word alignment probability.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  4. CLMM use word alignment probability to decrease the influences from “2'51?” (masterpiece) to “tour”, “de” and “force” individually, using the word projection probability (i.e.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  5. word alignment probability), which is 0.3 in this case.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  6. E14] "Elfiiii‘" xEE': #% Z'El’Eo Figure 2: Word Alignment Probability
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  7. In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis, such as building multilingual sentiment lexicons.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

proposed model

Appears in 5 sentences as: proposed model (5)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage significantly.
    Page 1, “Abstract”
  2. By “synchronizing” the generation of words in the source language and the target language in a parallel corpus, the proposed model can (1) improve vocabulary coverage by learning sentiment words from the unlabeled parallel corpus; (2) transfer polarity label information between the source language and target language using a parallel corpus.
    Page 2, “Introduction”
  3. This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language.
    Page 2, “Introduction”
  4. Table 2 shows the accuracy of the baseline systems as well as the proposed model (CLMM).
    Page 6, “Experiment”
  5. First, the proposed model can learn previously unseen sentiment words from large unlabeled data, which are not covered by the limited vocabulary in machine translation of the labeled data.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention proposed model.

See all papers in Proc. ACL that mention proposed model.

Back to top.

conditional probability

Appears in 4 sentences as: conditional probabilities (1) conditional probability (3)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. The parameters to be estimated include conditional probabilities of word to class, P(w8|c) and P(wt|c), and word projection
    Page 4, “Cross-Lingual Mixture Model for Sentiment Classification”
  2. The obtained word-class conditional probability P (wt |c) can then be used to classify text in the target languages using Bayes Theorem and the Naive Bayes
    Page 4, “Cross-Lingual Mixture Model for Sentiment Classification”
  3. Instead of estimating word projection probability (P(w3|wt) and P(wt|w3)) and conditional probability of word to class (P(wt|c) and P(w3|c)) simultaneously in the training procedure, we estimate them separately since the word projection probability stays invariant when estimating other parameters.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”
  4. We use Expectation-Maximization (EM) algorithm (Dempster et al., 1977) to estimate the conditional probability of word w, and wt given class c, P(w3|c) and P(wt|c) respectively.
    Page 5, “Cross-Lingual Mixture Model for Sentiment Classification”

See all papers in Proc. ACL 2012 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

maximum entropy

Appears in 4 sentences as: Maximum Entropy (1) maximum entropy (3)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. (2002) compare the performance of three commonly used machine learning models (Naive Bayes, Maximum Entropy and SVM).
    Page 2, “Related Work”
  2. They propose a method of training two classifiers based on maximum entropy formulation to maximize their prediction agreement on the parallel corpus.
    Page 3, “Related Work”
  3. Following the description in (Lu et al., 2011), we remove neutral sentences and keep only high confident positive and negative sentences as predicted by a maximum entropy classifier trained on the labeled data.
    Page 6, “Experiment”
  4. This model use English labeled data and Chinese labeled data to obtain initial parameters for two maximum entropy classifiers (for English documents and Chinese documents), and then conduct EM-iterations to update the parameters to gradually improve the agreement of the two monolingual classifiers on the unlabeled parallel sentences.
    Page 6, “Experiment”

See all papers in Proc. ACL 2012 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

cross validations

Appears in 3 sentences as: cross validations (3)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. We set the hyper-parameters by conducting cross validations on the labeled data.
    Page 6, “Experiment”
  2. We conduct 5-fold cross validations on Chinese labeled data.
    Page 7, “Experiment”
  3. Another reason is that we use 5-fold cross validations in this setting, while the previous setting is an open test setting.
    Page 7, “Experiment”

See all papers in Proc. ACL 2012 that mention cross validations.

See all papers in Proc. ACL that mention cross validations.

Back to top.

sentiment analysis

Appears in 3 sentences as: Sentiment Analysis (1) sentiment analysis (2)
In Cross-Lingual Mixture Model for Sentiment Classification
  1. Sentiment Analysis (also known as opinion mining), which aims to extract the sentiment information from text, has attracted extensive attention in recent years.
    Page 1, “Introduction”
  2. Sentiment classification, the task of determining the sentiment orientation (positive, negative or neutral) of text, has been the most extensively studied task in sentiment analysis .
    Page 1, “Introduction”
  3. In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis , such as building multilingual sentiment lexicons.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2012 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.