SciSurf: Index of 'Evaluating the Impact of Coder Errors on Active Learning'

Evaluating the Impact of Coder Errors on Active Learning

Rehbein, Ines and Ruppenhofer, Josef

Published in Proc. ACL, 2011

Article Structure

Abstract

Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification.

Introduction

Josef Ruppenhofer Computational Linguistics Saarland University josefr@coli.uni—sb.de

Related Work

We are interested in the question whether or not AL can be successfully applied to a supervised classification task where we have to deal with a considerable amount of inconsistencies and noise in the data, which is the case for many NLP tasks (e.g.

Topics

random sampling

Appears in 9 sentences as: random sampling (9)

In Evaluating the Impact of Coder Errors on Active Learning

Schein and Ungar observe that none of the 8 sampling methods investigated in their experiment achieved a significant improvement over the random sampling baseline on type b) errors.
Page 3, “Related Work”
In fact, entropy sampling and margin sampling even showed a decrease in performance compared to random sampling .
Page 3, “Related Work”
In the first setting, we randomly select new instances from the pool ( random sampling ; rand).
Page 4, “Related Work”
For all degrees of randomly inserted noise, active learning (ALrand) outperforms random sampling (rand) at an
Page 4, “Related Work”
When inserting more noise, performance for ALbias decreases, and with around 20% of biased errors in the pool AL performs worse than our random sampling baseline.
Page 4, “Related Work”
In the random noise setting (ALrand), even after inserting 30% of errors AL clearly outperforms random sampling .
Page 4, “Related Work”
This confirms the findings that under certain circumstances AL performs worse than random sampling (Dang, 2004; Schein and Ungar, 2007; Rehbein et al., 2010).
Page 4, “Related Work”
Even for the noisier data sets with up to 26% of errors, ALbias with filtering performs at least as well as random sampling .
Page 7, “Related Work”
We showed how biased coder decisions can result in an accuracy for AL approaches which is below the one for random sampling .
Page 7, “Related Work”

See all papers in Proc. ACL 2011 that mention random sampling.

See all papers in Proc. ACL that mention random sampling.

feature vectors

Appears in 5 sentences as: feature vector (1) feature vectors (7)

In Evaluating the Impact of Coder Errors on Active Learning

Instead, we create new feature vectors Fgen on the basis of the feature vectors Fseed in S. For each class in S, we extract all attribute-value pairs from the feature vectors for this particular class.
Page 6, “Related Work”
For each class, we randomly select features (with replacement) from Fseed and combine them into a new feature vector Fgen, retaining the distribution of the different classes in the data.
Page 6, “Related Work”
As a result, we obtain a more general set of feature vectors Fgen with characteristic features being distributed more evenly over the different feature vectors .
Page 6, “Related Work”
0 create new, more general feature vectors Fgen from this set (with replacement)
Page 6, “Related Work”
We presented an approach based on a resampling of the features in the seed data and guided by an ensemble of classifiers trained on the resampled feature vectors .
Page 7, “Related Work”

See all papers in Proc. ACL 2011 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

word senses

Appears in 5 sentences as: Word Sense (1) word sense (1) word senses (3)

In Evaluating the Impact of Coder Errors on Active Learning

Active learning has been applied to several NLP tasks like part-of—speech tagging (Ringger et al., 2007), chunking (Ngai and Yarowsky, 2000), syntactic parsing (Osborne and Baldridge, 2004; Hwa, 2004), Named Entity Recognition (Shen et al., 2004; Laws and Schutze, 2008; Tomanek and Hahn, 2009), Word Sense Disambiguation (Chen et al., 2006; Zhu and Hovy, 2007; Chan and Ng, 2007), text classification (Tong and Koller, 1998) or statistical machine translation (Haffari and Sarkar, 2009), and has been shown to reduce the amount of annotated data needed to achieve a certain classifier performance, sometimes by as much as half.
Page 1, “Introduction”
sentiment analysis, the detection of metaphors, WSD with fine-grained word senses , to name but a few).
Page 2, “Related Work”
Table 1: Distribution of word senses in pool and test sets
Page 4, “Related Work”
The different word senses are evenly distributed over the rejected instances (H1: Commitment 30, drohenl-salsa 38, Run_risk 36; H2: Commitment 3, drohenl-salsa 4, Run_risk 4).
Page 7, “Related Work”
This shows that there is less uncertainty about the majority word sense , Run_risk.
Page 7, “Related Work”

See all papers in Proc. ACL 2011 that mention word senses.

See all papers in Proc. ACL that mention word senses.