New Word Detection for Sentiment Analysis
Huang, Minlie and Ye, Borui and Wang, Yichen and Chen, Haiqiang and Cheng, Junjun and Zhu, Xiaoyan

Article Structure

Abstract

Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation, named entity extraction, and sentiment analysis.

Introduction

New words on the Internet have been emerging all the time, particularly in user— generated content.

Related Work

New word detection has been usually interweaved with word segmentation, particularly in Chinese NLP.

Methodology

3.1 Definitions

Experiment

In this section, we will conduct the following experiments: first, we will compare our method to several baselines, and perform parameter tuning with extensive experiments; second, we will classify polarity of new sentiment words using two methods; third, we will demonstrate how new sentiment words will benefit sentiment classification.

Conclusion

In order to extract new sentiment words from large-scale user-generated content, this paper proposes a fully unsupervised, purely data-driven, and

Topics

POS tags

Appears in 8 sentences as: POS tagged (2) POS tags (7)
In New Word Detection for Sentiment Analysis
  1. The method is almost free of linguistic resources (except POS tags ), and requires no elaborated linguistic rules.
    Page 1, “Abstract”
  2. This framework is fully unsupervised and purely data-driven, and requires very lightweight linguistic resources (i.e., only POS tags ).
    Page 2, “Introduction”
  3. In order to obtain lexical patterns, we can define regular expressions with POS tags 2 and apply the regular expressions on POS tagged texts.
    Page 3, “Methodology”
  4. 2Such expressions are very simple and easy to write because we only need to consider POS tags of adverbial and auxiliary word.
    Page 3, “Methodology”
  5. Our algorithm is in spirit to double propagation (Qiu et al., 2011), however, the differences are apparent in that: firstly, we use very lightweight linguistic information (except POS tags ); secondly, our major contributions are to propose statistical measures to address the following key issues: first, to measure the utility of lexical patterns; second, to measure the possibility of a candidate word being a new word.
    Page 3, “Methodology”
  6. Algorithm 1: New word detection algorithm Input: D: a large set of POS tagged posts W5: a set of seed words kp: the number of patterns chosen at each iteration kc: the number of patterns in the candidate pattern set kw: the number of words added at each iteration K: the number of words returned Output: A list of ranked new words W 1 Obtain all lexical patterns using regular expressions on 1); Count the frequency of each lexical pattern and extract words matched by each pattern ; 3 Obtain top kc frequent patterns as candidate pattern set ’PC and top 5,000 frequent words as candidate word set We ;
    Page 4, “Methodology”
  7. almost knowledge-free (except POS tags ) framework.
    Page 9, “Conclusion”
  8. The method is almost free of linguistic resources (except POS tags ), and does not rely on elaborated linguistic rules.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

word segmentation

Appears in 8 sentences as: word segmentation (8)
In New Word Detection for Sentiment Analysis
  1. Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation , named entity extraction, and sentiment analysis.
    Page 1, “Abstract”
  2. Automatic extraction of new words is indispensable to many tasks such as Chinese word segmentation , machine translation, named entity extraction, question answering, and sentiment analysis.
    Page 1, “Introduction”
  3. New word detection is one of the most critical issues in Chinese word segmentation .
    Page 1, “Introduction”
  4. Recent studies (Sproat and Emerson, 2003) (Chen, 2003) have shown that more than 60% of word segmentation errors result from new words.
    Page 1, “Introduction”
  5. New word detection has been usually interweaved with word segmentation , particularly in Chinese NLP.
    Page 2, “Related Work”
  6. In these works, new word detection is considered as an integral part of segmentation, where new words are identified as the most probable segments inferred by the probabilistic models; and the detected new word can be further used to improve word segmentation .
    Page 2, “Related Work”
  7. Obviously, in order to obtain the value of 3(wi), some particular Chinese word segmentation tool is required.
    Page 5, “Methodology”
  8. The posts were then part-of-speech tagged using a Chinese word segmentation tool named ICTCLAS (Zhang et al., 2003).
    Page 6, “Experiment”

See all papers in Proc. ACL 2014 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

sentiment analysis

Appears in 7 sentences as: Sentiment Analysis (1) sentiment analysis (6)
In New Word Detection for Sentiment Analysis
  1. Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation, named entity extraction, and sentiment analysis .
    Page 1, “Abstract”
  2. We also demonstrate how new sentiment word will benefit sentiment analysis .
    Page 1, “Abstract”
  3. Automatic extraction of new words is indispensable to many tasks such as Chinese word segmentation, machine translation, named entity extraction, question answering, and sentiment analysis .
    Page 1, “Introduction”
  4. pr Sentiment Analysis
    Page 1, “Introduction”
  5. New word detection is also important for sentiment analysis such as opinionated phrase extraction and polarity classification.
    Page 1, “Introduction”
  6. This paper aims to detect new word for sentiment analysis .
    Page 1, “Introduction”
  7. We are particulary interested in extracting new sentiment word that can express opinions or sentiment, which is of high value towards sentiment analysis .
    Page 1, “Introduction”

See all papers in Proc. ACL 2014 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

Chinese word

Appears in 6 sentences as: Chinese word (5) Chinese words (1)
In New Word Detection for Sentiment Analysis
  1. Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation, named entity extraction, and sentiment analysis.
    Page 1, “Abstract”
  2. Automatic extraction of new words is indispensable to many tasks such as Chinese word segmentation, machine translation, named entity extraction, question answering, and sentiment analysis.
    Page 1, “Introduction”
  3. New word detection is one of the most critical issues in Chinese word segmentation.
    Page 1, “Introduction”
  4. Statistics show that more than 1000 new Chinese words appear every
    Page 1, “Introduction”
  5. Obviously, in order to obtain the value of 3(wi), some particular Chinese word segmentation tool is required.
    Page 5, “Methodology”
  6. The posts were then part-of-speech tagged using a Chinese word segmentation tool named ICTCLAS (Zhang et al., 2003).
    Page 6, “Experiment”

See all papers in Proc. ACL 2014 that mention Chinese word.

See all papers in Proc. ACL that mention Chinese word.

Back to top.

Chinese word segmentation

Appears in 5 sentences as: Chinese word segmentation (5)
In New Word Detection for Sentiment Analysis
  1. Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation , named entity extraction, and sentiment analysis.
    Page 1, “Abstract”
  2. Automatic extraction of new words is indispensable to many tasks such as Chinese word segmentation , machine translation, named entity extraction, question answering, and sentiment analysis.
    Page 1, “Introduction”
  3. New word detection is one of the most critical issues in Chinese word segmentation .
    Page 1, “Introduction”
  4. Obviously, in order to obtain the value of 3(wi), some particular Chinese word segmentation tool is required.
    Page 5, “Methodology”
  5. The posts were then part-of-speech tagged using a Chinese word segmentation tool named ICTCLAS (Zhang et al., 2003).
    Page 6, “Experiment”

See all papers in Proc. ACL 2014 that mention Chinese word segmentation.

See all papers in Proc. ACL that mention Chinese word segmentation.

Back to top.

sentiment classification

Appears in 5 sentences as: Sentiment Classification (1) sentiment classification (4)
In New Word Detection for Sentiment Analysis
  1. ° We investigate the problem of polarity prediction of new sentiment word and demonstrate that inclusion of new sentiment word benefits sentiment classification tasks.
    Page 2, “Introduction”
  2. In this section, we will conduct the following experiments: first, we will compare our method to several baselines, and perform parameter tuning with extensive experiments; second, we will classify polarity of new sentiment words using two methods; third, we will demonstrate how new sentiment words will benefit sentiment classification .
    Page 6, “Experiment”
  3. 4.6 Application of New Sentiment Words to Sentiment Classification
    Page 8, “Experiment”
  4. In this section, we justify whether inclusion of new sentiment word would benefit sentiment classification .
    Page 8, “Experiment”
  5. Experiments also demonstrate that inclusion of new sentiment words benefits sentiment classification definitely.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

regular expressions

Appears in 3 sentences as: regular expression (1) regular expressions (3)
In New Word Detection for Sentiment Analysis
  1. For example, Justeson and Katz (1995) extracted technical terminologies from documents using a regular expression .
    Page 2, “Related Work”
  2. In order to obtain lexical patterns, we can define regular expressions with POS tags 2 and apply the regular expressions on POS tagged texts.
    Page 3, “Methodology”
  3. Algorithm 1: New word detection algorithm Input: D: a large set of POS tagged posts W5: a set of seed words kp: the number of patterns chosen at each iteration kc: the number of patterns in the candidate pattern set kw: the number of words added at each iteration K: the number of words returned Output: A list of ranked new words W 1 Obtain all lexical patterns using regular expressions on 1); Count the frequency of each lexical pattern and extract words matched by each pattern ; 3 Obtain top kc frequent patterns as candidate pattern set ’PC and top 5,000 frequent words as candidate word set We ;
    Page 4, “Methodology”

See all papers in Proc. ACL 2014 that mention regular expressions.

See all papers in Proc. ACL that mention regular expressions.

Back to top.

SVM

Appears in 3 sentences as: SVM (3)
In New Word Detection for Sentiment Analysis
  1. The second model is a SVM model in which opinion words are used as feature, and 5-fold cross validation is conducted.
    Page 8, “Experiment”
  2. 5 This is not necessary for the SVM model.
    Page 8, “Experiment”
  3. # Pos/Neg Lexicon SVM Hownet 627/1,038 0.737 0.756 Hownet+NW 743/ 1,150 0.770 0.779 Hownet+T100 679/ 1,172 0.761 0.774 cptHownet 138/125 0.738 0.758 cptHownet+NW 254/237 0.774 0.782 cptHownet+T100 190/159 0.764 0.775
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.