Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

Article Structure

Abstract

We present a method that learns word embedding for Twitter sentiment classification in this paper.

Introduction

Twitter sentiment classification has attracted increasing research interest in recent years (J iang et al., 2011; Hu et al., 2013).

Related Work

In this section, we present a brief review of the related work from two perspectives, Twitter sentiment classification and learning continuous representations for sentiment classification.

Topics

sentiment classification

Appears in 52 sentences as: Sentiment Classification (6) sentiment classification (42) sentiment classifier (5) sentiment classifiers (3)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. We present a method that learns word embedding for Twitter sentiment classification in this paper.
    Page 1, “Abstract”
  2. Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with handcrafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set.
    Page 1, “Abstract”
  3. Twitter sentiment classification has attracted increasing research interest in recent years (J iang et al., 2011; Hu et al., 2013).
    Page 1, “Introduction”
  4. (2013) build the top-performed system in the Twitter sentiment classification track of SemEval 2013 (Nakov et al., 2013), using diverse sentiment lexicons and a variety of handcrafted features.
    Page 1, “Introduction”
  5. For the task of sentiment classification , an effective feature leam-ing method is to compose the representation of a sentence (or document) from the representations of the words or phrases it contains (Socher et al., 2013b; Yessenalina and Cardie, 2011).
    Page 1, “Introduction”
  6. Although existing word embedding leam-ing algorithms (Collobert et al., 2011; Mikolov et al., 2013) are intuitive choices, they are not effective enough if directly used for sentiment classification .
    Page 1, “Introduction”
  7. These automatically collected tweets contain noises so they cannot be directly used as gold training data to build sentiment classifiers , but they are effective enough to provide weakly supervised signals for training the sentiment-specific word embedding.
    Page 2, “Introduction”
  8. We apply SSWE as features in a supervised learning framework for Twitter sentiment classification , and evaluate it on the benchmark dataset in SemEval 2013.
    Page 2, “Introduction”
  9. To our knowledge, this is the first work that exploits word embedding for Twitter sentiment classification .
    Page 2, “Introduction”
  10. In this section, we present a brief review of the related work from two perspectives, Twitter sentiment classification and learning continuous representations for sentiment classification .
    Page 2, “Related Work”
  11. 2.1 Twitter Sentiment Classification
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

word embedding

Appears in 46 sentences as: Word Embedding (3) word embedding (39) word embeddings (6)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. We present a method that learns word embedding for Twitter sentiment classification in this paper.
    Page 1, “Abstract”
  2. We address this issue by learning sentiment-specific word embedding (SSWE), which encodes sentiment information in the continuous representation of words.
    Page 1, “Abstract”
  3. To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons.
    Page 1, “Abstract”
  4. Accordingly, it is a crucial step to learn the word representation (or word embedding ), which is a dense, low-dimensional and real-valued vector for a word.
    Page 1, “Introduction”
  5. Although existing word embedding leam-ing algorithms (Collobert et al., 2011; Mikolov et al., 2013) are intuitive choices, they are not effective enough if directly used for sentiment classification.
    Page 1, “Introduction”
  6. In this paper, we propose learning sentiment-specific word embedding (SSWE) for sentiment analysis.
    Page 1, “Introduction”
  7. To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
    Page 2, “Introduction”
  8. We learn the sentiment-specific word embedding from tweets, leveraging massive tweets with emoticons as distant- supervised corpora without any manual annotations.
    Page 2, “Introduction”
  9. These automatically collected tweets contain noises so they cannot be directly used as gold training data to build sentiment classifiers, but they are effective enough to provide weakly supervised signals for training the sentiment-specific word embedding .
    Page 2, “Introduction”
  10. In the accuracy of polarity consistency between each sentiment word and its top N closest words, SSWE outperforms existing word embedding learning algorithms.
    Page 2, “Introduction”
  11. 0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations;
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention word embedding.

See all papers in Proc. ACL that mention word embedding.

Back to top.

neural networks

Appears in 16 sentences as: Neural Network (1) neural network (7) neural networks (8)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
    Page 1, “Abstract”
  2. To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
    Page 2, “Introduction”
  3. 0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations;
    Page 2, “Introduction”
  4. propose Recursive Neural Network (RNN) (2011b), matrix-vector RNN (2012) and Recursive Neural Tensor Network (RNTN) (2013b) to learn the compositionality of phrases of any length based on the representation of each pair of children recursively.
    Page 3, “Related Work”
  5. (2011) that follow the probabilistic document model (Blei et al., 2003) and give an sentiment predictor function to each word, we develop neural networks and map each ngram to the sentiment polarity of sentence.
    Page 3, “Related Work”
  6. We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE.
    Page 3, “Related Work”
  7. The original and corrupted ngram-s are treated as inputs of the feed-forward neural network , respectively.
    Page 3, “Related Work”
  8. Following the traditional C&W model (Collobert et al., 2011), we incorporate the sentiment information into the neural network to learn sentiment-specific word embedding.
    Page 4, “Related Work”
  9. We develop three neural networks with different strategies to integrate the sentiment information of tweets.
    Page 4, “Related Work”
  10. We therefore slide the window of ngram across a sentence, and then predict the sentiment polarity based on each ngram with a shared neural network .
    Page 4, “Related Work”
  11. In the neural network , the distributed representation of higher layer are interpreted as features describing the input.
    Page 4, “Related Work”

See all papers in Proc. ACL 2014 that mention neural networks.

See all papers in Proc. ACL that mention neural networks.

Back to top.

sentiment lexicons

Appears in 15 sentences as: sentiment lexicon (2) sentiment lexicon, (1) Sentiment Lexicons (1) sentiment lexicons (11)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. (2013) build the top-performed system in the Twitter sentiment classification track of SemEval 2013 (Nakov et al., 2013), using diverse sentiment lexicons and a variety of handcrafted features.
    Page 1, “Introduction”
  2. The quality of SSWE is also directly evaluated by measuring the word similarity in the embedding space for sentiment lexicons .
    Page 2, “Introduction”
  3. We also directly evaluate the effectiveness of the SSWE by measuring the word similarity in the embedding space for sentiment lexicons .
    Page 6, “Related Work”
  4. (5) NRC: NRC builds the top-performed system in SemEval 2013 Twitter sentiment classification track which incorporates diverse sentiment lexicons and many manually designed features.
    Page 6, “Related Work”
  5. We achieve 84.98% by using only SSWEu as features without borrowing any sentiment lexicons or handcrafted rules.
    Page 7, “Related Work”
  6. 4.2 Word Similarity of Sentiment Lexicons
    Page 8, “Related Work”
  7. bedding space for sentiment lexicons .
    Page 9, “Related Work”
  8. The evaluation metric is the accuracy of polarity consistency between each sentiment word and its top N closest words in the sentiment lexicon,
    Page 9, “Related Work”
  9. where #Lezc is the number of words in the sentiment lexicon , 212,- is the i-th word in the lexicon, cij is the j-th closest word to 212,- in the lexicon with cosine similarity, mm, cij) is an indicator function that is equal to 1 if 212,- and cij have the same sentiment polarity and 0 for the opposite case.
    Page 9, “Related Work”
  10. The higher accuracy refers to a better polarity consistency of words in the sentiment lexicon .
    Page 9, “Related Work”
  11. Experiment Setup and Datasets We utilize the widely-used sentiment lexicons , namely MPQA (Wilson et al., 2005) and HL (Hu and Liu, 2004), to evaluate the quality of word embedding.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention sentiment lexicons.

See all papers in Proc. ACL that mention sentiment lexicons.

Back to top.

unigram

Appears in 13 sentences as: unigram (12) unigrams (4)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. We learn embedding for unigrams , bigrams and trigrams separately with same neural network and same parameter setting.
    Page 5, “Related Work”
  2. The contexts of unigram (bigram/trigram) are the surrounding unigrams (bigrams/trigrams), respectively.
    Page 5, “Related Work”
  3. 25$ employs the embedding of unigrams , bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.
    Page 6, “Related Work”
  4. Lum, Lbi and Lm are the lookup tables of the unigram , bigram and trigram embedding, respectively.
    Page 6, “Related Work”
  5. Method Macro-F1 DistSuper + unigram 61.74 DistSuper + uni/bi/tri-gram 63.84 SVM + unigram 74.50 SVM + uni/bi/tri-gram 75.06 NBSVM 75.28 RAE 75.12 NRC (Top System in SemEval) 84.73 NRC - ngram 84.17 SSWEu 84.98 SSWEu+NRC 86.58 SSWEu+NRC-ngram 86.48
    Page 7, “Related Work”
  6. We use the embedding of unigrams , bigrams and trigrams in the experimen-
    Page 7, “Related Work”
  7. The column uni +bi denotes the use of unigram and bigram embedding, and the column uni+bi+tri indicates the use of unigram , bigram and trigram embedding.
    Page 7, “Related Work”
  8. Embedding unigram uni+bi uni+bi+tri C&W 74.89 75.24 75.89 Word2vec 73.21 75.07 76.31
    Page 7, “Related Work”
  9. From the first column of Table 3, we can see that the performance of C&W and word2vec are obviously lower than sentiment-specific word embeddings by only using unigram embedding as features.
    Page 7, “Related Work”
  10. 5MVSA and ReEmb are not suitable for learning bigram and trigram embedding because their sentiment predictor functions only utilize the unigram embedding.
    Page 7, “Related Work”
  11. The underlying reason is that a phrase, which cannot be accurately represented by unigram embedding, is directly encoded into the ngram embedding as an idiomatic unit.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

syntactic context

Appears in 10 sentences as: syntactic context (5) syntactic contexts (5)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text.
    Page 1, “Abstract”
  2. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors.
    Page 1, “Abstract”
  3. The most serious problem is that traditional methods typically model the syntactic context of words but ignore the sentiment information of text.
    Page 1, “Introduction”
  4. (2011) introduce C&W model to learn word embedding based on the syntactic contexts of words.
    Page 3, “Related Work”
  5. The C&W model learns word embedding by modeling syntactic contexts of words but ignoring sentiment information.
    Page 5, “Related Work”
  6. By contrast, SSWEh and SSWET learn sentiment-specific word embedding by integrating the sentiment polarity of sentences but leaving out the syntactic contexts of words.
    Page 5, “Related Work”
  7. We develop a unified model (SSWEu) in this part, which captures the sentiment information of sentences as well as the syntactic contexts of words.
    Page 5, “Related Work”
  8. SSWEu performs better when 04 is in the range of [0.5, 0.6], which balances the syntactic context and sentiment information.
    Page 8, “Related Work”
  9. The model with a=1 stands for C&W model, which only encodes the syntactic contexts of words.
    Page 8, “Related Work”
  10. Our unified model combining syntactic context of words and sentiment information of sentences yields the best performance in both experiments.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention syntactic context.

See all papers in Proc. ACL that mention syntactic context.

Back to top.

development set

Appears in 9 sentences as: development set (7) development sets (2)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. The training and development sets were completely in full to task participants.
    Page 6, “Related Work”
  2. However, we were unable to download all the training and development sets because some tweets were deleted or not available due to modified authorization status.
    Page 6, “Related Work”
  3. The tradeoff parameter of ReEmb (Labutov and Lipson, 2013) is tuned on the development set of SemEval 2013.
    Page 7, “Related Work”
  4. Effect of 04 in SSWEU We tune the hyper-parameter 04 of SSWEu on the development set by using unigram embedding as features.
    Page 8, “Related Work”
  5. Figure 2: Macro-Fl of SSWEu on the development set of SemEval 2013 with different 04.
    Page 8, “Related Work”
  6. Figure 2 shows the macro-Fl of SSWEu on positive/negative classification of tweets with different 04 on our development set .
    Page 8, “Related Work”
  7. Results of positive/negative classification of tweets on our development set are given in Figure 3.
    Page 8, “Related Work”
  8. Figure 3: Macro-Fl of SSWEu with different size of distant-supervised data on our development set .
    Page 8, “Related Work”
  9. When we have 10 million distant-supervised tweets, the SSWEu feature increases the macro-Fl of positive/negative classification of tweets to 82.94% on our development set .
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

embeddings

Appears in 9 sentences as: embeddings (9)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. t. The embeddings of C&W (Collobert et al., 2011), word2vec4, WVSA (Maas et al., 2011) and our models are trained with the same dataset and same parameter setting.
    Page 7, “Related Work”
  2. ReEmb(C&W) and ReEmb(w2v) stand for the use of embeddings learned from 10 million distant-supervised tweets with C&W and word2vec, respectively.
    Page 7, “Related Work”
  3. Table 3: Macro-F1 on positive/negative classification of tweets with different word embeddings .
    Page 7, “Related Work”
  4. From the first column of Table 3, we can see that the performance of C&W and word2vec are obviously lower than sentiment-specific word embeddings by only using unigram embedding as features.
    Page 7, “Related Work”
  5. When such word embeddings are fed as features to a Twitter sentiment classifier, the discriminative ability of sentiment words are weakened thus the classification performance is affected.
    Page 7, “Related Work”
  6. Among three sentiment-specific word embeddings , SSWEu captures more context information and yields best performance.
    Page 8, “Related Work”
  7. From each row of Table 3, we can see that the bigram and trigram embeddings consistently improve the performance of Twitter sentiment classification.
    Page 8, “Related Work”
  8. Sentiment-specific word embeddings (SSWEh, SSWET, SSWEu) outperform existing neural models (C&W, word2vec) by large margins.
    Page 9, “Related Work”
  9. Experimental results further demonstrate that sentiment-specific word embeddings are able to capture the sentiment information of texts and distinguish words with opposite sentiment polarity, which are not well solved in traditional neural
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention embeddings.

See all papers in Proc. ACL that mention embeddings.

Back to top.

learning algorithms

Appears in 9 sentences as: learning algorithm (3) learning algorithms (6)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. (2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity.
    Page 1, “Introduction”
  2. To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.
    Page 2, “Introduction”
  3. In the accuracy of polarity consistency between each sentiment word and its top N closest words, SSWE outperforms existing word embedding learning algorithms .
    Page 2, “Introduction”
  4. Under this assumption, many feature learning algorithms are proposed to obtain better classification performance (Pang and Lee, 2008; Liu, 2012; Feldman, 2013).
    Page 3, “Related Work”
  5. We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE.
    Page 3, “Related Work”
  6. In the following sections, we introduce the traditional method before presenting the details of SSWE learning algorithms .
    Page 3, “Related Work”
  7. We compare sentiment-specific word embedding (SSWEh, SSWET, SSWEu) with baseline embedding learning algorithms by only using word embedding as features for Twitter sentiment classification.
    Page 7, “Related Work”
  8. Each row of Table 3 represents a word embedding learning algorithm .
    Page 7, “Related Work”
  9. Table 5 shows our results compared to other word embedding learning algorithms .
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

sentiment analysis

Appears in 8 sentences as: sentiment analysis (8)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors.
    Page 1, “Abstract”
  2. It is meaningful for some tasks such as pos-tagging (Zheng et al., 2013) as the two words have similar usages and grammatical roles, but it becomes a disaster for sentiment analysis as they have the opposite sentiment polarity.
    Page 1, “Introduction”
  3. In this paper, we propose learning sentiment-specific word embedding (SSWE) for sentiment analysis .
    Page 1, “Introduction”
  4. 0 We release the sentiment-specific word embedding learned from 10 million tweets, which can be adopted off-the-shell in other sentiment analysis tasks.
    Page 2, “Introduction”
  5. In the field of sentiment analysis , Bespalov et al.
    Page 3, “Related Work”
  6. This paper focuses on learning sentiment-specific word embedding, which is tailored for sentiment analysis .
    Page 3, “Related Work”
  7. (4) RAE: Recursive Autoencoder (Socher et al., 2011c) has been proven effective in many sentiment analysis tasks by learning compositionality automatically.
    Page 6, “Related Work”
  8. A typical case in sentiment analysis is that the composed phrase and multiword expression may have a different sentiment polarity than the individual words it contains, such as not [bad] and [great] deal of (the word in the bracket has different sentiment polarity with the ngram).
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

bigram

Appears in 7 sentences as: bigram (5) bigrams (3)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. We learn embedding for unigrams, bigrams and trigrams separately with same neural network and same parameter setting.
    Page 5, “Related Work”
  2. 25$ employs the embedding of unigrams, bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.
    Page 6, “Related Work”
  3. Lum, Lbi and Lm are the lookup tables of the unigram, bigram and trigram embedding, respectively.
    Page 6, “Related Work”
  4. We use the embedding of unigrams, bigrams and trigrams in the experimen-
    Page 7, “Related Work”
  5. The column uni +bi denotes the use of unigram and bigram embedding, and the column uni+bi+tri indicates the use of unigram, bigram and trigram embedding.
    Page 7, “Related Work”
  6. 5MVSA and ReEmb are not suitable for learning bigram and trigram embedding because their sentiment predictor functions only utilize the unigram embedding.
    Page 7, “Related Work”
  7. From each row of Table 3, we can see that the bigram and trigram embeddings consistently improve the performance of Twitter sentiment classification.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

language model

Appears in 7 sentences as: (1) language model (5) language modeling (1)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. (2012) adopt the tweets with emoticons to smooth the language model and Hu et al.
    Page 2, “Related Work”
  2. With the revival of interest in deep learning (Bengio et al., 2013), incorporating the continuous representation of a word as features has been proving effective in a variety of NLP tasks, such as parsing (Socher et al., 2013a), language modeling (Bengio et al., 2003; Mnih and Hinton, 2009) and NER (Turian et al., 2010).
    Page 3, “Related Work”
  3. The training objective is that the original ngram is expected to obtain a higher language model score than the corrupted ngram by a margin of 1.
    Page 3, “Related Work”
  4. (1) where t is the original ngram, 757" is the corrupted ngram, wa(-) is a one-dimensional scalar representing the language model score of the input ngram.
    Page 3, “Related Work”
  5. The output f Cw is the language model score of the input, which is calculated as given in Equation 2, where L is the lookup table of word embedding, w1, 2122, (91, ()2 are the parameters of linear layers.
    Page 3, “Related Work”
  6. The two scalars (f3, ff) stand for language model score and sentiment score of the input ngram, respectively.
    Page 5, “Related Work”
  7. The training objectives of SSWEu are that (l) the original ngram should obtain a higher language model score 1360:) than the corrupted ngram ff; (tr), and (2) the sentiment score of original ngram ff (75) should be more consistent with the gold polarity annotation of sentence than corrupted ngram ff (tr).
    Page 5, “Related Work”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

feature set

Appears in 6 sentences as: feature set (6)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with handcrafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set .
    Page 1, “Abstract”
  2. After concatenating the SSWE feature with existing feature set , we push the state-of-the-art to 86.58% in macro-Fl.
    Page 2, “Introduction”
  3. NRC-ngram refers to the feature set of NRC leaving out ngram features.
    Page 6, “Related Work”
  4. After concatenating SSWEU with the feature set of NRC, the performance is further improved to 86.58%.
    Page 7, “Related Work”
  5. The concatenated features SSWEu +NRC-ngram (86.48%) outperform the original feature set of NRC (84.73%).
    Page 7, “Related Work”
  6. After combining SSWEu with the feature set of NRC, we improve NRC from 74.86% to 75.39% for subjective classification.
    Page 7, “Related Work”

See all papers in Proc. ACL 2014 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

word representation

Appears in 5 sentences as: word representation (3) word representations (2)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text.
    Page 1, “Abstract”
  2. Accordingly, it is a crucial step to learn the word representation (or word embedding), which is a dense, low-dimensional and real-valued vector for a word.
    Page 1, “Introduction”
  3. However, the one-hot word representation cannot sufficiently capture the complex linguistic characteristics of words.
    Page 3, “Related Work”
  4. The results of bag-of-ngram (unUbUtri-gram) features are not satisfied because the one-hot word representation cannot capture the latent connections between words.
    Page 6, “Related Work”
  5. In this paper, we propose learning continuous word representations as features for Twitter sentiment classification under a supervised learning framework.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention word representation.

See all papers in Proc. ACL that mention word representation.

Back to top.

model score

Appears in 5 sentences as: (1) model score (4)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. The training objective is that the original ngram is expected to obtain a higher language model score than the corrupted ngram by a margin of 1.
    Page 3, “Related Work”
  2. (1) where t is the original ngram, 757" is the corrupted ngram, wa(-) is a one-dimensional scalar representing the language model score of the input ngram.
    Page 3, “Related Work”
  3. The output f Cw is the language model score of the input, which is calculated as given in Equation 2, where L is the lookup table of word embedding, w1, 2122, (91, ()2 are the parameters of linear layers.
    Page 3, “Related Work”
  4. The two scalars (f3, ff) stand for language model score and sentiment score of the input ngram, respectively.
    Page 5, “Related Work”
  5. The training objectives of SSWEu are that (l) the original ngram should obtain a higher language model score 1360:) than the corrupted ngram ff; (tr), and (2) the sentiment score of original ngram ff (75) should be more consistent with the gold polarity annotation of sentence than corrupted ngram ff (tr).
    Page 5, “Related Work”

See all papers in Proc. ACL 2014 that mention model score.

See all papers in Proc. ACL that mention model score.

Back to top.

manually annotated

Appears in 5 sentences as: manual annotations (2) manually annotated (3)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. (2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity.
    Page 1, “Introduction”
  2. We learn the sentiment-specific word embedding from tweets, leveraging massive tweets with emoticons as distant- supervised corpora without any manual annotations .
    Page 2, “Introduction”
  3. 0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations ;
    Page 2, “Introduction”
  4. The sentiment classifier is built from tweets with manually annotated sentiment polarity.
    Page 5, “Related Work”
  5. The reason is that RAE and NBSVM learn the representation of tweets from the small-scale manually annotated training set, which cannot well capture the comprehensive linguistic phenomenons of words.
    Page 7, “Related Work”

See all papers in Proc. ACL 2014 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.

SVM

Appears in 4 sentences as: SVM (5)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. (2) SVM : The ngram features and Support Vector Machine are widely used baseline methods to build sentiment classifiers (Pang et al., 2002).
    Page 6, “Related Work”
  2. LibLinear is used to train the SVM classifier.
    Page 6, “Related Work”
  3. (3) NBSVM: NBSVM (Wang and Manning, 2012) is a state-of—the-art performer on many sentiment classification datasets, which trades-off between Naive Bayes and NB-enhanced SVM .
    Page 6, “Related Work”
  4. Method Macro-F1 DistSuper + unigram 61.74 DistSuper + uni/bi/tri-gram 63.84 SVM + unigram 74.50 SVM + uni/bi/tri-gram 75.06 NBSVM 75.28 RAE 75.12 NRC (Top System in SemEval) 84.73 NRC - ngram 84.17 SSWEu 84.98 SSWEu+NRC 86.58 SSWEu+NRC-ngram 86.48
    Page 7, “Related Work”

See all papers in Proc. ACL 2014 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

loss functions

Appears in 4 sentences as: loss function (1) loss functions (3)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. sentences or tweets) in their loss functions .
    Page 1, “Abstract”
  2. sentences or tweets) in their loss functions .
    Page 2, “Introduction”
  3. The loss function of SSWEu is the linear combination of two hinge losses,
    Page 5, “Related Work”
  4. We learn sentiment-specific word embedding (SSWE) by integrating the sentiment information into the loss functions of three neural networks.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention loss functions.

See all papers in Proc. ACL that mention loss functions.

Back to top.

context information

Appears in 4 sentences as: context information (3) contextual information (1)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. SSWE outperforms MVSA by exploiting more contextual information in the sentiment predictor function.
    Page 8, “Related Work”
  2. Among three sentiment-specific word embeddings, SSWEu captures more context information and yields best performance.
    Page 8, “Related Work”
  3. SSWE outperforms MVSA and ReEmb by exploiting more context information of words and sentiment information of sentences, respectively.
    Page 9, “Related Work”
  4. These methods typically only model the context information of words so that they cannot distinguish words with similar context but opposite sentiment polarity (e.g.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention context information.

See all papers in Proc. ACL that mention context information.

Back to top.

Recursive

Appears in 3 sentences as: Recursive (4)
In Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
  1. propose Recursive Neural Network (RNN) (2011b), matrix-vector RNN (2012) and Recursive Neural Tensor Network (RNTN) (2013b) to learn the compositionality of phrases of any length based on the representation of each pair of children recursively.
    Page 3, “Related Work”
  2. (2013) present Combinatory Cate-gorial Autoencoders to learn the compositionality of sentence, which marries the Combinatory Cat-egorial Grammar with Recursive Autoencoder.
    Page 3, “Related Work”
  3. (4) RAE: Recursive Autoencoder (Socher et al., 2011c) has been proven effective in many sentiment analysis tasks by learning compositionality automatically.
    Page 6, “Related Work”

See all papers in Proc. ACL 2014 that mention Recursive.

See all papers in Proc. ACL that mention Recursive.

Back to top.