Opinion Mining on YouTube
Severyn, Aliaksei and Moschitti, Alessandro and Uryupina, Olga and Plank, Barbara and Filippova, Katja

Article Structure

Abstract

This paper defines a systematic approach to Opinion Mining (OM) on YouTube comments by (i) modeling classifiers for predicting the opinion polarity and the type of comment and (ii) proposing robust shallow syntactic structures for improving model adaptability.

Introduction

Social media such as Twitter, Facebook or YouTube contain rapidly changing information generated by millions of users that can dramatically affect the reputation of a person or an organization.

Related work

Most prior work on more general OM has been carried out on more standardized forms of text, such as consumer reviews or newswire.

Representations and models

Our approach to OM on YouTube relies on the design of classifiers to predict comment type and opinion polarity.

YouTube comments corpus

To build a corpus of YouTube comments, we focus on a particular set of videos (technical reviews and advertisings) featuring commercial products.

Experiments

This section reports: (i) experiments on individual subtasks of opinion and type classification; (ii) the full task of predicting type and sentiment; (iii) study on the adaptability of our system by learning on one domain and testing on the other; (iv) learning curves that provide an indication on the required amount and type of data and the scalability to other domains.

Conclusions and Future Work

We carried out a systematic study on OM from YouTube comments by training a set of supervised multi-class classifiers distinguishing between video and product related opinions.

Topics

tree kernel

Appears in 9 sentences as: Tree Kernel (3) tree kernel (5) Tree Kernels (1) tree kernels (3)
In Opinion Mining on YouTube
  1. We rely on the tree kernel technology to automatically extract and learn features with better generalization power than bag-of-words.
    Page 1, “Abstract”
  2. In particular, we define an efficient tree kernel derived from the Partial Tree Kernel , (Moschitti, 2006a), suitable for encoding structural representation of comments into Support Vector Machines (SVMs).
    Page 2, “Introduction”
  3. These trees are input to tree kernel functions for generating structural features.
    Page 3, “Representations and models”
  4. The latter are automatically generated and learned by SVMs with expressive tree kernels .
    Page 4, “Representations and models”
  5. In other words, the tree fragment: [S [negative—VP [negative—V [destroy] ] [PRODUCT-NP [PRODUCT-N [xoom] ] ] ] is a strong feature (induced by tree kernels ) to help the classifier to discriminate such hard cases.
    Page 4, “Representations and models”
  6. Moreover, tree kernels generate all possible subtrees, thus producing generalized (back-off) features, e.g., [S [negative—VP [negative—V [destroy] ] [PRODUCT—NP] ] ]] or [S [negative—VP [PRODUCT—NP]]]].
    Page 4, “Representations and models”
  7. We define a novel and efficient tree kernel function, namely, S_hallow syntactic Tree Kernel (SHTK), which is as expressive as the Partial Tree Kernel (PTK) (Moschitti, 2006a) to handle feature engineering over the structural representations of the STRUCT model.
    Page 4, “Representations and models”
  8. Shallow syntactic tree kernel .
    Page 4, “Representations and models”
  9. The general equations for Convolution Tree Kernels is:
    Page 5, “Representations and models”

See all papers in Proc. ACL 2014 that mention tree kernel.

See all papers in Proc. ACL that mention tree kernel.

Back to top.

feature vectors

Appears in 8 sentences as: feature vector (2) feature vectors (6)
In Opinion Mining on YouTube
  1. Most of the previous work on supervised sentiment analysis use feature vectors to encode documents.
    Page 2, “Related work”
  2. In the next sections, we define a baseline feature vector model and a novel structural model based on kernel methods.
    Page 3, “Representations and models”
  3. We go beyond traditional feature vectors by employing structural models (S TRUCT), which encode each comment into a shallow syntactic tree.
    Page 3, “Representations and models”
  4. A polynomial kernel of degree 3 is applied to feature vectors (FVE C).
    Page 4, “Representations and models”
  5. The STRUCT model treats each comment as a tuple :1: = (Tm) composed of a shallow syntactic tree T and a feature vector '0.
    Page 4, “Representations and models”
  6. where KTK computes SHTK (defined next), and KV is a kernel over feature vectors , e.g., linear, polynomial, Gaussian, etc.
    Page 4, “Representations and models”
  7. We use standard feature vectors augmented by shallow syntactic trees enriched with additional conceptual information.
    Page 9, “Conclusions and Future Work”
  8. This paper makes several contributions: (i) it shows that effective OM can be carried out with supervised models trained on high quality annotations; (ii) it introduces a novel annotated corpus of YouTube comments, which we make available for the research community; (iii) it defines novel structural models and kernels, which can improve on feature vectors , e.g., up to 30% of relative improvement in type classification, when little data is available, and demonstrates that the structural model scales well to other domains.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

sentiment lexicon

Appears in 7 sentences as: sentiment lexicon (5) sentiment lexicons (2)
In Opinion Mining on YouTube
  1. We enrich the traditional bag-of-word representation with features from a sentiment lexicon and features quantifying the negation present in the comment.
    Page 3, “Representations and models”
  2. - lexicon: a sentiment lexicon is a collection of words associated with a positive or negative sentiment.
    Page 3, “Representations and models”
  3. We use two manually constructed sentiment lexicons that are freely available: the MPQA Lexicon (Wilson et al., 2005) and the lexicon of Hu and Liu (2004).
    Page 3, “Representations and models”
  4. Our structures are specifically adapted to the noisy user-generated texts and encode important aspects of the comments, e. g., words from the sentiment lexicons , product concepts and negation words, which specifically targets the sentiment and comment type classification tasks.
    Page 3, “Representations and models”
  5. the sentiment lexicon are enriched with a polarity tag (either positive or negative), while negation words are labeled with the NEG tag.
    Page 4, “Representations and models”
  6. For example, the comment in Figure 1 shows two positive and one negative word from the sentiment lexicon .
    Page 4, “Representations and models”
  7. We conjecture that sentiment prediction for AUTO category is largely driven by one-shot phrases and statements where it is hard to improve upon the bag-of-words and sentiment lexicon features.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention sentiment lexicon.

See all papers in Proc. ACL that mention sentiment lexicon.

Back to top.

bag-of-words

Appears in 6 sentences as: bag-of-words (6)
In Opinion Mining on YouTube
  1. We rely on the tree kernel technology to automatically extract and learn features with better generalization power than bag-of-words .
    Page 1, “Abstract”
  2. The comment contains a product name xoom and some negative expressions, thus, a bag-of-words model would derive a negative polarity for this product.
    Page 1, “Introduction”
  3. Clearly, the bag-of-words lacks the structural information linking the sentiment with the target product.
    Page 1, “Introduction”
  4. Such classifiers are traditionally based on bag-of-words and more advanced features.
    Page 3, “Representations and models”
  5. We conjecture that sentiment prediction for AUTO category is largely driven by one-shot phrases and statements where it is hard to improve upon the bag-of-words and sentiment lexicon features.
    Page 7, “Experiments”
  6. The bag-of-words model seems to be affected by the data sparsity problem which becomes a crucial issue when only a small training set is available.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention bag-of-words.

See all papers in Proc. ACL that mention bag-of-words.

Back to top.

classification task

Appears in 6 sentences as: classification task (3) classification tasks (3)
In Opinion Mining on YouTube
  1. Our structures are specifically adapted to the noisy user-generated texts and encode important aspects of the comments, e. g., words from the sentiment lexicons, product concepts and negation words, which specifically targets the sentiment and comment type classification tasks .
    Page 3, “Representations and models”
  2. The resulting annotator agreement 04 value (Krip-pendorf, 2004; Artstein and Poesio, 2008) scores are 60.6 (AUTO), 72.1 (TABLETS) for the sentiment task and 64.1 (AUTO), 79.3 (TABLETS) for the type classification task .
    Page 6, “YouTube comments corpus”
  3. Similar to the opinion classification task , comment type classification is a multi-class classification with three classes: video, product and uninform.
    Page 6, “Experiments”
  4. Table 1: Summary of YouTube comments data used in the sentiment, type and full classification tasks .
    Page 6, “Experiments”
  5. We cast this problem as a single multi-class classification task with seven classes: the Cartesian product between {product , video} type labels and {posit ive, neutral , negative} sentiment labels plus the uninf ormat ive category (spam and ofl-topic).
    Page 6, “Experiments”
  6. Nevertheless, as we see in Figure 2, the learning curves for sentiment and type classification tasks across both product categories do not confirm this intuition.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention classification task.

See all papers in Proc. ACL that mention classification task.

Back to top.

part-of-speech

Appears in 5 sentences as: part-of-speech (5)
In Opinion Mining on YouTube
  1. In particular, our shallow tree structure is a two-level syntactic hierarchy built from word lemmas (leaves) and part-of-speech tags that are further grouped into chunks (Fig.
    Page 3, “Representations and models”
  2. As full syntactic parsers such as constituency or dependency tree parsers would significantly degrade in performance on noisy texts, e.g., Twitter or YouTube comments, we opted for shallow structures, which rely on simpler and more robust components: a part-of-speech tagger and a chunker.
    Page 3, “Representations and models”
  3. Hence, we use the CMU Twitter pos-tagger (Gimpel et al., 2011; Owoputi et al., 2013) to obtain the part-of-speech tags.
    Page 3, “Representations and models”
  4. To automatically identify concept words of the video we use context words (tokens detected as nouns by the part-of-speech tagger) from the video title and video description and match them in the tree.
    Page 3, “Representations and models”
  5. For the matched words, we enrich labels of their parent nodes ( part-of-speech and chunk) with the PRODUCT tag.
    Page 3, “Representations and models”

See all papers in Proc. ACL 2014 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

In-domain

Appears in 4 sentences as: In-domain (2) in-domain (2)
In Opinion Mining on YouTube
  1. We start off by presenting the results for the traditional in-domain setting, where both TRAIN and TEST come from the same domain, e.g., AUTO or TABLETS.
    Page 7, “Experiments”
  2. 5.3.1 In-domain experiments
    Page 7, “Experiments”
  3. Figure 2: In-domain learning curves.
    Page 8, “Experiments”
  4. Similar to the in-domain experiments, we studied the effect of the source domain size on the target test performance.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention In-domain.

See all papers in Proc. ACL that mention In-domain.

Back to top.

manually annotated

Appears in 4 sentences as: manual annotations (1) manually annotated (3)
In Opinion Mining on YouTube
  1. An extensive empirical evaluation on our manually annotated YouTube comments corpus shows a high classification accuracy and highlights the benefits of structural models in a cross-domain setting.
    Page 1, “Abstract”
  2. The second contribution of the paper is the creation and annotation (by an expert coder) of a comment corpus containing 35k manually labeled comments for two product YouTube domains: tablets and automobiles.1 It is the first manually annotated corpus that enables researchers to use supervised methods on YouTube for comment classification and opinion analysis.
    Page 2, “Introduction”
  3. For each video, we extracted all available comments (limited to maximum lk comments per video) and manually annotated each comment with its type and polarity.
    Page 5, “YouTube comments corpus”
  4. This is an important advantage of our structural approach, since we cannot realistically expect to obtain manual annotations for 10k+ comments for each (of many thousands) product domains present on YouTube.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.

part-of-speech tagger

Appears in 4 sentences as: part-of-speech tagger (2) part-of-speech tags (2)
In Opinion Mining on YouTube
  1. In particular, our shallow tree structure is a two-level syntactic hierarchy built from word lemmas (leaves) and part-of-speech tags that are further grouped into chunks (Fig.
    Page 3, “Representations and models”
  2. As full syntactic parsers such as constituency or dependency tree parsers would significantly degrade in performance on noisy texts, e.g., Twitter or YouTube comments, we opted for shallow structures, which rely on simpler and more robust components: a part-of-speech tagger and a chunker.
    Page 3, “Representations and models”
  3. Hence, we use the CMU Twitter pos-tagger (Gimpel et al., 2011; Owoputi et al., 2013) to obtain the part-of-speech tags .
    Page 3, “Representations and models”
  4. To automatically identify concept words of the video we use context words (tokens detected as nouns by the part-of-speech tagger ) from the video title and video description and match them in the tree.
    Page 3, “Representations and models”

See all papers in Proc. ACL 2014 that mention part-of-speech tagger.

See all papers in Proc. ACL that mention part-of-speech tagger.

Back to top.

sentiment analysis

Appears in 4 sentences as: sentiment analysis (4)
In Opinion Mining on YouTube
  1. A recent study focuses on sentiment analysis for Twitter (Pak and Paroubek, 2010), however, their corpus was compiled automatically by searching for emoticons expressing positive and negative sentiment only.
    Page 2, “Related work”
  2. Most of the previous work on supervised sentiment analysis use feature vectors to encode documents.
    Page 2, “Related work”
  3. One of the challenging aspects of sentiment analysis of YouTube data is that the comments may express the sentiment not only towards the product shown in the video, but also the video itself, i.e., users may post positive comments to the video while being generally negative about the product and vice versa.
    Page 6, “Experiments”
  4. Given that the main goal of sentiment analysis is to select sentiment-bearing comments and identify their polarity, distinguishing between of f —t opic and spam categories is not critical.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

Sentiment classification

Appears in 4 sentences as: Sentiment classification (3) sentiment classifier (1)
In Opinion Mining on YouTube
  1. This would strongly bias the FVEC sentiment classifier to assign a positive label to the comment.
    Page 4, “Representations and models”
  2. Sentiment classification .
    Page 6, “Experiments”
  3. (a) Sentiment classification
    Page 8, “Experiments”
  4. (a) Sentiment classification
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention Sentiment classification.

See all papers in Proc. ACL that mention Sentiment classification.

Back to top.

domain adaptation

Appears in 3 sentences as: domain adaptability (1) domain adaptation (3)
In Opinion Mining on YouTube
  1. 5.2), which give the possibility to study the domain adaptability of the supervised models by training on one category and testing on the other (and vice versa).
    Page 2, “Introduction”
  2. Therefore, rather than trying to build a specialized system for every new target domain, as it has been done in most prior work on domain adaptation (Blitzer et al., 2007; Daume, 2007), the domain adaptation problem boils down to finding a more robust system (Sszsgaard and Johannsen, 2012; Plank and Moschitti, 2013).
    Page 2, “Related work”
  3. Additionally, we considered a setting including a small amount of training data from the target data (i.e., supervised domain adaptation ).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

social media

Appears in 3 sentences as: Social media (1) social media (2)
In Opinion Mining on YouTube
  1. Social media such as Twitter, Facebook or YouTube contain rapidly changing information generated by millions of users that can dramatically affect the reputation of a person or an organization.
    Page 1, “Introduction”
  2. This raises the importance of automatic extraction of sentiments and opinions expressed in social media .
    Page 1, “Introduction”
  3. The aforementioned corpora are, however, only partially suitable for developing models on social media , since the informal text poses additional challenges for Information Extraction and Natural Language Processing.
    Page 2, “Related work”

See all papers in Proc. ACL 2014 that mention social media.

See all papers in Proc. ACL that mention social media.

Back to top.

structural features

Appears in 3 sentences as: structural feature (1) structural features (2)
In Opinion Mining on YouTube
  1. In contrast, we show that adding structural features from syntactic trees is particularly useful for the cross-domain setting.
    Page 2, “Related work”
  2. These trees are input to tree kernel functions for generating structural features .
    Page 3, “Representations and models”
  3. The structural model, in contrast, is able to identify the product of interest (xoom) and associate it with the negative expression through a structural feature and thus correctly classify the comment as negat ive.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention structural features.

See all papers in Proc. ACL that mention structural features.

Back to top.