SciSurf: Index of 'Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems'

Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok

Published in Proc. ACL, 2010

Article Structure

Abstract

Subjectivity analysis is a rapidly growing field of study.

Introduction

The field of NLP has seen a recent surge in the amount of research on subjectivity analysis.

Related Work

Much research have been put into developing methods for multilingual subjectivity analysis recently.

Multilanguage-Comparability 3.1 Motivation

The quality of a subjectivity analysis tool is measured by its ability to distinguish subjectivity from objectivity and/or positive sentiments from negative sentiments.

Multilingual Subjectivity System

We create a number of multilingual systems consisting of multiple subsystems each processing a language, where one system analyzes English, and the other systems analyze the Korean, Chinese, and Japanese languages.

Experiment

5.1 Experimental Setup

Discussion

Which approach is most suitable for multilingual subjectivity analysis?

Conclusion

Multilanguage-comparability is an analysis system’s ability to retain its decision criteria across different languages.

Topics

manually annotated

Appears in 7 sentences as: Manual Annotation (1) manual annotation (2) manually annotated (4)

In Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

(2008) and Boiy and Moens (2009) have created manually annotated gold standards in target languages and studied various feature selection and learning techniques in machine learning approaches to analyze sentiments in multilingual web documents.
Page 2, “Related Work”
Evaluating with intensity is not easy for the latter approach; if test corpora already exist with intensity annotations for both languages, normalizing the intensity scores to a comparable scale is necessary (yet is uncertain unless every pair is checked manually), otherwise every pair of multilingual texts needs a manual annotation with its relative order of intensity.
Page 3, “Multilanguage-Comparability 3.1 Motivation”
Three human annotators who are fluent in the two languages manually annotated N-to-N sentence alignments for each language pairs (KR-EN, KR-CH, KR-JP).
Page 5, “Experiment”
Manual Annotation and Agreement Study
Page 5, “Experiment”
To assess the performance of our subjectivity analysis systems, the Korean sentence chunks were manually annotated by two native speakers of Korean with Subjective and Objective labels (Table l).
Page 5, “Experiment”
In addition, to verify how consistently the subjectivity of the original texts is projected to the translated, we carried out another manual annotation and agreement study with Korean and English sentence chunks (Table 2).
Page 5, “Experiment”
To avoid the role played by annotators’ private views from disagreements, the subjectivity of sentence chunks in English were manually annotated by one of the annotators for the Korean text.
Page 5, “Experiment”

See all papers in Proc. ACL 2010 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

language pairs

Appears in 4 sentences as: language pairs (5)

In Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Three human annotators who are fluent in the two languages manually annotated N-to-N sentence alignments for each language pairs (KR-EN, KR-CH, KR-JP).
Page 5, “Experiment”
By keeping only the sentence chunks whose Korean chunk appears in all language pairs , we were left with 859 sentence chunk pairs.
Page 5, “Experiment”
The subjectivity analysis systems are evaluated with all language pairs with kappa and Pearson’s correlation coefficients.
Page 6, “Experiment”
Within corpus-based systems, S-CB performs better with language pairs that include English, and T-CB performs better with language pairs of the target languages.
Page 6, “Experiment”

See all papers in Proc. ACL 2010 that mention language pairs.

See all papers in Proc. ACL that mention language pairs.

sentiment analysis

Appears in 4 sentences as: sentiment analysis (3) sentiment analyzers (1)

In Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

There are multilingual subjectivity analysis systems available that have been built to monitor and analyze various concerns and opinions on the Internet; among the better known are OASYS from the University of Maryland that analyzes opinions on topics from news article searches in multiple languages (Cesarano et al., 2007)1 and TextMap, an entity search engine developed by Stony Brook University for sentiment analysis along with other functionalities (Bautin et al., 2008).2 Though these systems currently rely on English analysis tools and a machine translation (MT) technology to
Page 1, “Introduction”
Given sentiment analysis systems in different languages, there are many situations when the analysis outcomes need to be multilanguage-comparable.
Page 1, “Introduction”
To overcome the shortcomings of available resources and to take advantage of ensemble systems, Wan (2008) and Wan (2009) explored methods for developing a hybrid system for Chinese using English and Chinese sentiment analyzers .
Page 2, “Related Work”
For future work, we aim extend this work to constructing a multilingual sentiment analysis system and evaluate it with multilingual datasets such as product reviews collected from different countries.
Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

SVM

Appears in 3 sentences as: SVM (3)

In Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Previous studies have found that, among several ML-based approaches, the SVM classifier generally performs well in many subjectivity analysis tasks (Pang et al., 2002; Banea et al., 2008).
Page 4, “Multilingual Subjectivity System”
An SVM score (a margin or the distance from a learned decision boundary) with a positive value predicts the input as being subjective, and negative value as objective.
Page 4, “Multilingual Subjectivity System”
The second and the third approaches are carried out as follows: Corpus-based (T-CB): We translate the MPQA corpus into the target languages sentence by sentence using a web-based service.6 Using the same method for S-CB, we train an SVM model for each language with the translated training corpora.
Page 4, “Multilingual Subjectivity System”

See all papers in Proc. ACL 2010 that mention SVM.

See all papers in Proc. ACL that mention SVM.