SciSurf: Index of 'Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora'

Topics

unlabeled data (38)
labeled data (37)
parallel data (25)
sentiment classification (16)
MaXEnt (11)
sentence pairs (11)
joint model (10)
sentiment analysis (9)
parallel corpus (7)
SVM (6)
machine translation (5)
translation probability (5)
parallel sentences (4)
significantly improve (4)
models trained (3)
semi-supervised (3)
maximum entropy (3)
sentence-level (3)
feature weights (3)

Topics

unlabeled data (38)
labeled data (37)
parallel data (25)
sentiment classification (16)
MaXEnt (11)
sentence pairs (11)
joint model (10)
sentiment analysis (9)
parallel corpus (7)
SVM (6)
machine translation (5)
translation probability (5)
parallel sentences (4)
significantly improve (4)
models trained (3)
semi-supervised (3)
maximum entropy (3)
sentence-level (3)
feature weights (3)

Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin

Published in Proc. ACL, 2011

Article Structure

Abstract

Most previous work on multilingual sentiment analysis has focused on methods to adapt sentiment resources from resource-rich languages to resource—poor languages.

Introduction

The field of sentiment analysis has quickly attracted the attention of researchers and practitioners alike (e.g.

Related Work

Multilingual Sentiment Analysis.

A Joint Model with Unlabeled Parallel Text

We propose a maximum entropy-based statistical model.

Experimental Setup 4.1 Data Sets and Preprocessing

The following labeled datasets are used in our experiments.

Results and Analysis

In our experiments, the methods are tested in the two data settings with the corresponding unlabeled parallel corpus as mentioned in Section 4.6 We use

Conclusion

In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language.

Topics

unlabeled data

Appears in 38 sentences as: Unlabeled Data (19) unlabeled data (23)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data ; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Page 1, “Abstract”
maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)).
Page 2, “Introduction”
To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another.
Page 2, “Introduction”
Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009).
Page 2, “Related Work”
where y,-’ is the unobserved class label for the i-th instance in the unlabeled data .
Page 3, “A Joint Model with Unlabeled Parallel Text”
By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized:
Page 4, “A Joint Model with Unlabeled Parallel Text”
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data ; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Page 4, “A Joint Model with Unlabeled Parallel Text”
When 11 is 0, the algorithm ignores the unlabeled data and degenerates to two MaXEnt models trained on only the labeled data.
Page 4, “A Joint Model with Unlabeled Parallel Text”
3.4 Pseudo-Parallel Labeled and Unlabeled Data
Page 5, “A Joint Model with Unlabeled Parallel Text”
MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”

See all papers in Proc. ACL 2011 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

labeled data

Appears in 37 sentences as: Labeled Data (6) Labeled data (1) labeled data (34)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data.
Page 1, “Abstract”
Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following
Page 1, “Introduction”
The proposed maximum entropy-based EM approach jointly learns two monolingual sentiment classifiers by treating the sentiment labels in the unlabeled parallel text as unobserved latent variables, and maximizes the regularized joint likelihood of the language-specific labeled data together with the inferred sentiment labels of the parallel text.
Page 2, “Introduction”
where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
Page 3, “A Joint Model with Unlabeled Parallel Text”
By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized:
Page 4, “A Joint Model with Unlabeled Parallel Text”
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Page 4, “A Joint Model with Unlabeled Parallel Text”
When 11 is 0, the algorithm ignores the unlabeled data and degenerates to two MaXEnt models trained on only the labeled data .
Page 4, “A Joint Model with Unlabeled Parallel Text”
just the labeled data .
Page 4, “A Joint Model with Unlabeled Parallel Text”
Next, in the M-step, the parameters, 61 and 62, are updated using both the original labeled data (D1 and D2) and the newly labeled data U.
Page 4, “A Joint Model with Unlabeled Parallel Text”
Input: Labeled data D1 andDz Unlabeled parallel data U
Page 4, “A Joint Model with Unlabeled Parallel Text”
Train two initial monolingual models Train and initialize 61(0) and 62(0) on the labeled data 2.
Page 4, “A Joint Model with Unlabeled Parallel Text”

See all papers in Proc. ACL 2011 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

parallel data

Appears in 25 sentences as: (1) Parallel Data (3) parallel data (22)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data .
Page 1, “Abstract”
We furthermore find that improvements, albeit smaller, are obtained when the parallel data is replaced with a pseudo-parallel (i.e.
Page 2, “Introduction”
sentiment) bilingual (in L1 and L2) parallel data U that are defined as follows.
Page 3, “A Joint Model with Unlabeled Parallel Text”
where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
Page 3, “A Joint Model with Unlabeled Parallel Text”
However, there could be considerable noise in real-world parallel data , i.e.
Page 3, “A Joint Model with Unlabeled Parallel Text”
Therefore, by considering the noise in parallel data , we get:
Page 4, “A Joint Model with Unlabeled Parallel Text”
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Page 4, “A Joint Model with Unlabeled Parallel Text”
Input: Labeled data D1 andDz Unlabeled parallel data U
Page 4, “A Joint Model with Unlabeled Parallel Text”
We also try to remove neutral sentences from the parallel data since they can introduce noise into our model, which deals only with positive and negative examples.
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
Co-Training with SVMs (Co-SVM): This method applies SVM-based co-training given both the labeled training data and the unlabeled parallel data following Wan (2009).
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
8 By making use of the unlabeled parallel data , our proposed approach improves the accuracy, compared to MaXEnt, by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting.
Page 6, “Results and Analysis”

See all papers in Proc. ACL 2011 that mention parallel data.

See all papers in Proc. ACL that mention parallel data.

Back to top.

sentiment classification

Appears in 16 sentences as: Sentiment Classification (1) sentiment classification (11) sentiment classifiers (5)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data.
Page 1, “Abstract”
We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language.
Page 1, “Abstract”
Not surprisingly, most methods for sentiment classification are supervised learning techniques, which require training data annotated with the appropriate sentiment labels (e. g. document-level or sentence-level positive vs. negative polarity).
Page 1, “Introduction”
In addition, there is still much room for improvement in existing monolingual (including English) sentiment classifiers , especially at the sentence level (Pang and Lee, 2008).
Page 1, “Introduction”
In contrast to previous work, we (1) assume that some amount of sentiment-labeled data is available for the language pair under study, and (2) investigate methods to simultaneously improve sentiment classification for both languages.
Page 1, “Introduction”
The proposed maximum entropy-based EM approach jointly learns two monolingual sentiment classifiers by treating the sentiment labels in the unlabeled parallel text as unobserved latent variables, and maximizes the regularized joint likelihood of the language-specific labeled data together with the inferred sentiment labels of the parallel text.
Page 2, “Introduction”
To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another.
Page 2, “Introduction”
Prettenhofer and Stein (2010) investigate cross-lingual sentiment classification from the perspective of domain adaptation based on structural correspondence learning (Blitzer et al., 2006).
Page 2, “Related Work”
Approaches that do not explicitly involve resource adaptation include Wan (2009), which uses co-training (Blum and Mitchell, 1998) with English vs. Chinese features comprising the two independent “views” to exploit unlabeled Chinese data and a labeled English corpus and thereby improves Chinese sentiment classification .
Page 2, “Related Work”
Unlike the methods described above, we focus on simultaneously improving the performance of sentiment classification in a pair of languages by developing a model that relies on sentiment-labeled data in each language as well as unlabeled parallel text for the language pair.
Page 2, “Related Work”
Given the input data D1, D2 and U, our task is to jointly learn two monolingual sentiment classifiers — one for L1 and one for L2.
Page 3, “A Joint Model with Unlabeled Parallel Text”

See all papers in Proc. ACL 2011 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

MaXEnt

Appears in 11 sentences as: MaXEnt (7) MaxEnt (5)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Maximum entropy ( MaxEnt ) models1 have been widely used in many NLP tasks (Berger et al., 1996; Ratnaparkhi, 1997; Smith, 2006).
Page 3, “A Joint Model with Unlabeled Parallel Text”
With MaxEnt , we learn from the input data:
Page 3, “A Joint Model with Unlabeled Parallel Text”
When 11 is 0, the algorithm ignores the unlabeled data and degenerates to two MaXEnt models trained on only the labeled data.
Page 4, “A Joint Model with Unlabeled Parallel Text”
3.3 The EM Algorithm on MaXEnt
Page 4, “A Joint Model with Unlabeled Parallel Text”
First, the MaXEnt parameters, 61 and 62, are estimated from
Page 4, “A Joint Model with Unlabeled Parallel Text”
Two monolingual MaxEnt classifiers with
Page 4, “A Joint Model with Unlabeled Parallel Text”
MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
8 By making use of the unlabeled parallel data, our proposed approach improves the accuracy, compared to MaXEnt , by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting.
Page 6, “Results and Analysis”
8Significance is tested using paired t-tests with p<0.05: denotes statistical significance compared to the corresponding performance of MaXEnt ; * denotes statistical significance compared to SVM; and r denotes statistical significance compared to Co-SVM.
Page 6, “Results and Analysis”
When 11 is set to 0, the joint model degenerates to two MaXEnt models trained with only the labeled data.
Page 7, “Results and Analysis”
To further understand What contributions our proposed approach makes to the performance gain, we look inside the parameters in the MaXEnt models learned before and after adding the parallel unlabeled data.
Page 8, “Results and Analysis”

See all papers in Proc. ACL 2011 that mention MaXEnt.

See all papers in Proc. ACL that mention MaXEnt.

Back to top.

sentence pairs

Appears in 11 sentences as: sentence pair (5) sentence pairs (7)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

the sentence pairs may be noisily parallel (or even comparable) instead of fully parallel (Munteanu and Marcu, 2005).
Page 3, “A Joint Model with Unlabeled Parallel Text”
In such noisy cases, the labels (positive or negative) could be different for the two monolingual sentences in a sentence pair .
Page 3, “A Joint Model with Unlabeled Parallel Text”
Although we do not know the exact probability that a sentence pair exhibits the same label, we can approximate it using their translation
Page 3, “A Joint Model with Unlabeled Parallel Text”
where p(a,-) is the translation probability of the i-th sentence pair in U :2 37’ is the opposite of y’; the first term models the probability that x}, and x5, have the same label; and the second term models the probability that they have different labels.
Page 4, “A Joint Model with Unlabeled Parallel Text”
Because sentence pairs in the ISI corpus are quite noisy, we rely on Giza++ (Och and Ney, 2003) to obtain a new translation probability for each sentence pair , and select the 100,000 pairs with the highest translation probabilities.5
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
We then classify each unlabeled sentence pair by combining the two sentences in each pair into one.
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
5We removed sentence pairs with an original confidence score (given in the corpus) smaller than 0.98, and also removed the pairs that are too long (more than 60 characters in one sentence) to facilitate Giza++.
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
In each iteration, we select the most confidently predicted 50 positive and 50 negative sentences from each of the two classifiers, and take the union of the resulting 200 sentence pairs as the newly labeled training data.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1).
Page 6, “Results and Analysis”
However, even with only 2,000 unlabeled sentence pairs , the proposed approach still produces large performance gains.
Page 7, “Results and Analysis”
Examination of those sentence pairs in setting 2 for which the two monolingual models still
Page 9, “Results and Analysis”

See all papers in Proc. ACL 2011 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

joint model

Appears in 10 sentences as: Joint Model (1) joint model (8) jointly models (1)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

In Section 3, the proposed joint model is described.
Page 2, “Introduction”
Another notable approach is the work of Boyd-Graber and Resnik (2010), which presents a generative model --- supervised multilingual latent Dirichlet allocation --- that jointly models topics that are consistent across languages, and employs them to better predict sentiment ratings.
Page 2, “Related Work”
3.2 The Joint Model
Page 3, “A Joint Model with Unlabeled Parallel Text”
Since previous work (Banea et al., 2008; 2010; Wan, 2009) has shown that it could be useful to automatically translate the labeled data from the source language into the target language, we can further incorporate such translated labeled data into the joint model by adding the following component into Equation 6:
Page 5, “A Joint Model with Unlabeled Parallel Text”
In our experiments, the proposed joint model is compared with the following baseline methods.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
We first compare the proposed joint model (Joint) with the baselines in Table 2.
Page 6, “Results and Analysis”
Overall, the unlabeled parallel data improves classification accuracy for both languages when using our proposed joint model and Co-SVM.
Page 6, “Results and Analysis”
The joint model makes better use of the unlabeled parallel data than Co-SVM or TSVMs presumably because of its attempt to jointly optimize the two monolingual models via soft (probabilistic) assignments of the unlabeled instances to classes in each iteration, instead of the hard assignments in Co-SVM and TSVMs.
Page 6, “Results and Analysis”
When 11 is set to 0, the joint model degenerates to two MaXEnt models trained with only the labeled data.
Page 7, “Results and Analysis”
In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language.
Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention joint model.

See all papers in Proc. ACL that mention joint model.

Back to top.

sentiment analysis

Appears in 9 sentences as: Sentiment Analysis (1) sentiment analysis (8)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Most previous work on multilingual sentiment analysis has focused on methods to adapt sentiment resources from resource-rich languages to resource—poor languages.
Page 1, “Abstract”
The field of sentiment analysis has quickly attracted the attention of researchers and practitioners alike (e.g.
Page 1, “Introduction”
Indeed, sentiment analysis systems, which mine opinions from textual sources (e.g.
Page 1, “Introduction”
Previous work in multilingual sentiment analysis has therefore focused on methods to adapt sentiment resources (e.g.
Page 1, “Introduction”
This paper tackles the task of bilingual sentiment analysis .
Page 1, “Introduction”
To our knowledge, this is the first multilingual sentiment analysis study to focus on methods for simultaneously improving sentiment classification for a pair of languages based on unlabeled data rather than resource adaptation from one language to another.
Page 2, “Introduction”
Multilingual Sentiment Analysis .
Page 2, “Related Work”
There is a growing body of work on multilingual sentiment analysis .
Page 2, “Related Work”
Another issue is to investigate how to improve multilingual sentiment analysis by exploiting comparable corpora.
Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

parallel corpus

Appears in 7 sentences as: parallel corpus (7)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following
Page 1, “Introduction”
(2007), for example, generate subjectivity analysis resources in a new language from English sentiment resources by leveraging a bilingual dictionary or a parallel corpus .
Page 2, “Related Work”
We also consider the case where a parallel corpus is not available: to obtain a pseudo-parallel corpus U (i.e.
Page 5, “A Joint Model with Unlabeled Parallel Text”
For the unlabeled parallel text, we use the ISI Chinese-English parallel corpus (Munteanu and Marcu, 2005), which was extracted automatically from news articles published by Xinhua News Agency in the Chinese Gigaword (2nd Edition) and English Gigaword (2nd Edition) collections.
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
We choose the most confidently predicted 10,000 positive and 10,000 negative pairs to constitute the unlabeled parallel corpus U for each data setting.
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
In our experiments, the methods are tested in the two data settings with the corresponding unlabeled parallel corpus as mentioned in Section 4.6 We use
Page 6, “Results and Analysis”
In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language.
Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

SVM

Appears in 6 sentences as: SVM (7)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)).
Page 2, “Introduction”
SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
Monolingual TSVM (TSVM-M): This method learns two transductive SVM (TSVM) classifiers given the monolingual labeled data and the monolingual unlabeled data for each language.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
First, two monolingual SVM classifiers are built based on only the corresponding labeled data, and then they are bootstrapped by adding the most confident predicted examples from the unlabeled data into the training set.
Page 6, “Experimental Setup 4.1 Data Sets and Preprocessing”
Among the baselines, the best is Co-SVM; TSVMs do not always improve performance using the unlabeled data compared to the standalone SVM ; and TSVM-B outperforms TSVM-M except for Chinese in the second setting.
Page 6, “Results and Analysis”
8Significance is tested using paired t-tests with p<0.05: denotes statistical significance compared to the corresponding performance of MaXEnt; * denotes statistical significance compared to SVM ; and r denotes statistical significance compared to Co-SVM.
Page 6, “Results and Analysis”

See all papers in Proc. ACL 2011 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

machine translation

Appears in 5 sentences as: machine translation (5)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Page 1, “Abstract”
(2008; 2010) instead automatically translate the English resources using automatic machine translation engines for subjectivity classification.
Page 2, “Related Work”
sentences in one language with their corresponding automatic translations), we use an automatic machine translation system (e.g.
Page 5, “A Joint Model with Unlabeled Parallel Text”
As discussed in Section 3.4, we generate pseudo-parallel data by translating the monolingual sentences in each setting using Google’s machine translation system.
Page 7, “Results and Analysis”
Moreover, the proposed approach continues to produce (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

translation probability

Appears in 5 sentences as: translation probabilities (2) translation probability (3)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

The intuition here is that if the translation probability of two sentences is high, the probability that they have the same sentiment label should be high as well.
Page 4, “A Joint Model with Unlabeled Parallel Text”
where p(a,-) is the translation probability of the i-th sentence pair in U :2 37’ is the opposite of y’; the first term models the probability that x}, and x5, have the same label; and the second term models the probability that they have different labels.
Page 4, “A Joint Model with Unlabeled Parallel Text”
Because sentence pairs in the ISI corpus are quite noisy, we rely on Giza++ (Och and Ney, 2003) to obtain a new translation probability for each sentence pair, and select the 100,000 pairs with the highest translation probabilities.5
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
We first obtain translation probabilities for both directions (i.e.
Page 5, “Experimental Setup 4.1 Data Sets and Preprocessing”
Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1).
Page 6, “Results and Analysis”

See all papers in Proc. ACL 2011 that mention translation probability.

See all papers in Proc. ACL that mention translation probability.

Back to top.

parallel sentences

Appears in 4 sentences as: parallel sentences (4)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language.
Page 1, “Abstract”
each x,- is a sentence, and x}, and x3, are parallel sentences .
Page 3, “A Joint Model with Unlabeled Parallel Text”
Given the problem definition above, we now present a novel model to exploit the correspondence of parallel sentences in unlabeled bilingual text.
Page 3, “A Joint Model with Unlabeled Parallel Text”
If we assume that parallel sentences are perfect translations, the two sentences in each pair should have the same polarity label, which gives us:
Page 3, “A Joint Model with Unlabeled Parallel Text”

See all papers in Proc. ACL 2011 that mention parallel sentences.

See all papers in Proc. ACL that mention parallel sentences.

Back to top.

significantly improve

Appears in 4 sentences as: significantly improve (2) significantly improved (1) significantly improving (1)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Page 1, “Abstract”
Accuracy is significantly improved for both languages, by 3.44%-8.12%.
Page 2, “Introduction”
Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1).
Page 6, “Results and Analysis”
Our experiments show that the proposed approach can significantly improve sentiment classification for both languages.
Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention significantly improve.

See all papers in Proc. ACL that mention significantly improve.

Back to top.

models trained

Appears in 3 sentences as: models Train (1) models trained (2)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

When 11 is 0, the algorithm ignores the unlabeled data and degenerates to two MaXEnt models trained on only the labeled data.
Page 4, “A Joint Model with Unlabeled Parallel Text”
Train two initial monolingual models Train and initialize 61(0) and 62(0) on the labeled data 2.
Page 4, “A Joint Model with Unlabeled Parallel Text”
When 11 is set to 0, the joint model degenerates to two MaXEnt models trained with only the labeled data.
Page 7, “Results and Analysis”

See all papers in Proc. ACL 2011 that mention models trained.

See all papers in Proc. ACL that mention models trained.

Back to top.

semi-supervised

Appears in 3 sentences as: Semi-supervised (1) semi-supervised (2)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Semi-supervised Learning.
Page 2, “Related Work”
Another line of related work is semi-supervised learning, which combines labeled and unlabeled data to improve the performance of the task of interest (Zhu and Goldberg, 2009).
Page 2, “Related Work”
Among the popular semi-supervised methods (e. g. EM on Nai've Bayes (Nigam et al., 2000), co-training (Blum and Mitchell, 1998), transductive SVMs (Joachims, 1999b), and co-regularization (Sindhwani et al., 2005; Amini et al., 2010)), our approach employs the EM algorithm, extending it to the bilingual case based on maximum entropy.
Page 2, “Related Work”

See all papers in Proc. ACL 2011 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

maximum entropy

Appears in 3 sentences as: Maximum entropy (1) maximum entropy (2)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)).
Page 2, “Introduction”
Among the popular semi-supervised methods (e. g. EM on Nai've Bayes (Nigam et al., 2000), co-training (Blum and Mitchell, 1998), transductive SVMs (Joachims, 1999b), and co-regularization (Sindhwani et al., 2005; Amini et al., 2010)), our approach employs the EM algorithm, extending it to the bilingual case based on maximum entropy .
Page 2, “Related Work”
Maximum entropy (MaxEnt) models1 have been widely used in many NLP tasks (Berger et al., 1996; Ratnaparkhi, 1997; Smith, 2006).
Page 3, “A Joint Model with Unlabeled Parallel Text”

See all papers in Proc. ACL 2011 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

sentence-level

Appears in 3 sentences as: sentence-level (3)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Not surprisingly, most methods for sentiment classification are supervised learning techniques, which require training data annotated with the appropriate sentiment labels (e. g. document-level or sentence-level positive vs. negative polarity).
Page 1, “Introduction”
Although our approach should be applicable at the document-level and for additional sentiment tasks, we focus on sentence-level polarity classification in this work.
Page 2, “Introduction”
In this study, we focus on sentence-level sentiment classification, i.e.
Page 3, “A Joint Model with Unlabeled Parallel Text”

See all papers in Proc. ACL 2011 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

feature weights

Appears in 3 sentences as: feature weights (3)

In Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

where 5 is a real-valued vector of feature weights and j?
Page 3, “A Joint Model with Unlabeled Parallel Text”
where (5; and 5: are the vectors of feature weights for L1 and L2, respectively (for brevity we denote them as 61 and 62 in the remaining sections).
Page 3, “A Joint Model with Unlabeled Parallel Text”
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights .
Page 4, “A Joint Model with Unlabeled Parallel Text”

See all papers in Proc. ACL 2011 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.