Building Sentiment Lexicons for All Major Languages
Chen, Yanqing and Skiena, Steven

Article Structure

Abstract

Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resource-intensive process.

Introduction

Sentiment analysis of English texts has become a large and active research area, with many commercial applications, but the barrier of language limits the ability to assess the sentiment of most of the world’s population.

Related Work

Sentiment analysis is an important area of NLP with a large and growing literature.

Knowledge Graph Construction

In this section we will describe how we leverage off a variety of NLP resources to construct the semantic connection graph we will use to propagate sentiment lexicons.

Graph Propagation

Sentiment propagation starts from English sentiment lexicons.

Lexicon Evaluation

We collected all available published sentiment lexicons from non-English languages to serve as standard for our evaluation, including Arabic, Italian, German and Chinese.

Extrinsic Evaluation: Consistency of Wikipedia Sentiment

We consider evaluating our lexicons on the consistency of Wikipedia pages about a particular individual person among various languages.

Conclusions

Our knowledge graph propagation is generally effective at producing useful sentiment lexicons.

Topics

sentiment lexicons

Appears in 17 sentences as: sentiment lexicon (1) sentiment lexicons (16)
In Building Sentiment Lexicons for All Major Languages
  1. Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resource-intensive process.
    Page 1, “Abstract”
  2. In this paper, we address this lexicon gap by building high-quality sentiment lexicons for 136 major languages.
    Page 1, “Abstract”
  3. By appropriately propagating from seed words, we construct sentiment lexicons for each component language of our graph.
    Page 1, “Abstract”
  4. Although several well-regarded sentiment lexicons are available in English (Esuli and Sebastiani, 2006; Liu, 2010), the same is not true for most of the world’s languages.
    Page 1, “Introduction”
  5. Indeed, our literature search identified only 12 publicly available sentiment lexicons for only 5 non-English languages (Chinese mandarin, German, Arabic, Japanese and
    Page 1, “Introduction”
  6. In this paper, we strive to produce a comprehensive set of sentiment lexicons for the worlds’ major languages.
    Page 1, “Introduction”
  7. 0 New Sentiment Analysis Resources — We have generated sentiment lexicons for 136 major languages via graph propagation which are now publicly availablel.
    Page 1, “Introduction”
  8. We validate our own work through other publicly available, human annotated sentiment lexicons .
    Page 1, “Introduction”
  9. (2010) focus on generating topic specific sentiment lexicons .
    Page 2, “Related Work”
  10. In this section we will describe how we leverage off a variety of NLP resources to construct the semantic connection graph we will use to propagate sentiment lexicons .
    Page 2, “Knowledge Graph Construction”
  11. Sentiment propagation starts from English sentiment lexicons .
    Page 3, “Graph Propagation”

See all papers in Proc. ACL 2014 that mention sentiment lexicons.

See all papers in Proc. ACL that mention sentiment lexicons.

Back to top.

sentiment analysis

Appears in 15 sentences as: Sentiment Analysis (1) Sentiment analysis (3) sentiment analysis (11)
In Building Sentiment Lexicons for All Major Languages
  1. Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resource-intensive process.
    Page 1, “Abstract”
  2. Sentiment analysis of English texts has become a large and active research area, with many commercial applications, but the barrier of language limits the ability to assess the sentiment of most of the world’s population.
    Page 1, “Introduction”
  3. 0 New Sentiment Analysis Resources — We have generated sentiment lexicons for 136 major languages via graph propagation which are now publicly availablel.
    Page 1, “Introduction”
  4. Sentiment analysis is an important area of NLP with a large and growing literature.
    Page 2, “Related Work”
  5. Excellent surveys of the field include (Liu, 2013; Pang and Lee, 2008), establishing that rich online resources have greatly expanded opportunities for opinion mining and sentiment analysis .
    Page 2, “Related Work”
  6. (2007) build up an English lexicon-based sentiment analysis system to evaluate the general reputation of entities.
    Page 2, “Related Work”
  7. Liu (2010) introduces an efficient method, at the state of the art, for doing sentiment analysis and subjectivity in English.
    Page 2, “Related Work”
  8. (2010) perform sentiment analysis according to cross-domain contextualization and Pak and Paroubek (2010) focus on Twitter, doing research on colloquial format of English.
    Page 2, “Related Work”
  9. Work has been done to generalize sentiment analysis to other languages.
    Page 2, “Related Work”
  10. Denecke (2008) performs multilingual sentiment analysis using SentiWordNet.
    Page 2, “Related Work”
  11. (2011) work on better sentiment analysis system in Romanian.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

machine translation

Appears in 5 sentences as: Machine Translation (1) machine translation (4)
In Building Sentiment Lexicons for All Major Languages
  1. The ready availability of machine translation to and from English has prompted efforts to employ translation for sentiment analysis (Bautin et al., 2008).
    Page 2, “Related Work”
  2. (2008) demonstrate that machine translation can perform quite well when extending the subjectivity analysis to multilingual environment, which makes it inspiring to replicate their work on lexicon-based sentiment analysis.
    Page 2, “Related Work”
  3. (2013) combine machine translation and word representation to generate bilingual language resources.
    Page 2, “Related Work”
  4. 0 Machine Translation - We script the Google translation API to get even more semantic links.
    Page 3, “Knowledge Graph Construction”
  5. In total, machine translation provides 53.2% of the total links and establishes connections between 3.5 million vertices.
    Page 3, “Knowledge Graph Construction”

See all papers in Proc. ACL 2014 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

language pair

Appears in 4 sentences as: language pair (2) language pairs (2)
In Building Sentiment Lexicons for All Major Languages
  1. Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs .
    Page 1, “Abstract”
  2. Each language pair exhibits a Spearman sentiment correlation of at least 0.14, with an average correlation of 0.28 over all pairs.
    Page 1, “Introduction”
  3. Closely related language pairs (i.e.
    Page 3, “Knowledge Graph Construction”
  4. We use the Spearman correlation coefficient to measure the consistence of sentiment distribution across all entities with pages in a particular language pair .
    Page 5, “Extrinsic Evaluation: Consistency of Wikipedia Sentiment”

See all papers in Proc. ACL 2014 that mention language pair.

See all papers in Proc. ACL that mention language pair.

Back to top.

word embedding

Appears in 3 sentences as: word embedding (3)
In Building Sentiment Lexicons for All Major Languages
  1. proaches draft off of distributed word embedding which offer concise features reflecting the semantics of the underlying vocabulary.
    Page 2, “Related Work”
  2. (2010) create powerful word embedding by training on real and corrupted phrases, optimizing for the replaceability of words.
    Page 2, “Related Work”
  3. (2012) demonstrates a powerful approach to English sentiment using word embedding , which can easily be extended to other languages by training on appropriate text corpora.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention word embedding.

See all papers in Proc. ACL that mention word embedding.

Back to top.