SciSurf: Index of 'The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia'

Topics

feature set (7)
text classification (4)
Cosine Similarity (3)
cross validation (3)
human annotator (3)
learning algorithms (3)
topic distribution (3)

Topics

feature set (7)
text classification (4)
Cosine Similarity (3)
cross validation (3)
human annotator (3)
learning algorithms (3)
topic distribution (3)

The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc

Published in Proc. ACL, 2013

Article Structure

Abstract

With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge.

Introduction

User generated content is the main driving force of the increasingly social web.

Related Work

Topic bias is a known problem in text classification.

Quality Flaws and Flaw Recognition in Wikipedia

Quality standards in Wikipedia are mainly defined by the featured article criteria1 and the Wikipedia Manual of Stylez.

Data Selection and Corpus Creation

For creating our corpora, we start with selecting all cleanup templates listed under the categories neutrality and style in the typology of cleanup templates provided by Anderka and Stein (2012a).

Selection of Reliable Training Instances

Independent from the classification approach used to identify flawed articles, reliable training data is the most important prerequisite for good predictions.

Experiments

In the following, we describe our system architecture and the setup of our experiments.

Evaluation and Discussion

The SVMs achieve a similar cross-validated performance on all feature sets containing ngrams, showing only minor improvements for individual flaws when adding non-lexical features.

Conclusions

We showed that text classification based on Wikipedia cleanup templates is prone to a topic bias which causes skewed classifiers and overly optimistic cross-validated evaluation results.

Topics

feature set

Appears in 7 sentences as: feature set (4) Feature sets (1) feature sets (2)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

We selected a subset of these features for our experiments and grouped them into four feature sets in order to determine how well different combinations of features perform in the task.
Page 7, “Experiments”
Table 4: Feature sets used in the experiments
Page 7, “Experiments”
The SVMs achieve a similar cross-validated performance on all feature sets containing ngrams, showing only minor improvements for individual flaws when adding non-lexical features.
Page 8, “Evaluation and Discussion”
Table 6 shows the performance of the SVMs with RBF kernel12 on each dataset using the NGRAM feature set .
Page 8, “Evaluation and Discussion”
Classifiers using the N ONGRAM feature set achieved average F 1-scores below 0.50 on all datasets.
Page 8, “Evaluation and Discussion”
With the NGRAM feature set , the reliable classifiers outperformed the unreliable classifiers on all flaws that can be well identified with lexical cues, such as Advert or Technical.
Page 9, “Evaluation and Discussion”
We tested this hypothesis by using the full feature set ALL and saw a substantial improvement on the side of the unbiased classifier, while the performance of the biased classifier remained unchanged.
Page 9, “Evaluation and Discussion”

See all papers in Proc. ACL 2013 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

text classification

Appears in 4 sentences as: text classification (4)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

However, quality flaw detection based on cleanup template recognition suffers from a topic bias that is well known from other text classification applications such as authorship attribution or genre identification.
Page 1, “Introduction”
Topic bias is a known problem in text classification .
Page 2, “Related Work”
We showed that text classification based on Wikipedia cleanup templates is prone to a topic bias which causes skewed classifiers and overly optimistic cross-validated evaluation results.
Page 9, “Conclusions”
This bias is known from other text classification applications, such as authorship attribution, genre detection and native language detection.
Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

Cosine Similarity

Appears in 3 sentences as: Cosine Similarity (1) Cosine similarity (1) cosine similarity (1)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

We can then estimate the topical similarity of two article sets by calculating the cosine similarity of their category frequency vectors C712=Aand6722= Bas
Page 6, “Selection of Reliable Training Instances”
Cosine Similarity
Page 7, “Selection of Reliable Training Instances”
Table 3: Cosine similarity scores between the category frequency vectors of the flawed article sets and the respective random or reliable negatives
Page 7, “Selection of Reliable Training Instances”

See all papers in Proc. ACL 2013 that mention Cosine Similarity.

See all papers in Proc. ACL that mention Cosine Similarity.

Back to top.

cross validation

Appears in 3 sentences as: cross validation (3)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

The performance has been evaluated with 10-fold cross validation on 2,000 documents split equally into positive and negative instances.
Page 8, “Experiments”
The results have been obtained by 10-fold cross validation on 2,000 documents per flaw.
Page 8, “Evaluation and Discussion”
Table 6: F1 scores for the 10-fold cross validation of the SVMs with RBF kernel on all datasets using NGRAM features
Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.

human annotator

Appears in 3 sentences as: human annotations (1) human annotator (2)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Table 2: Agreement of human annotator with gold standard
Page 4, “Data Selection and Corpus Creation”
In order to test the reliability of these user assigned templates as quality flaw markers, we carried out an annotation study in which a human annotator was asked to perform the binary flaw detection task manually.
Page 4, “Data Selection and Corpus Creation”
Table 2 lists the chance corrected agreement (Cohen’s K) along with the F1 performance of the human annotations against the gold standard corpus.
Page 4, “Data Selection and Corpus Creation”

See all papers in Proc. ACL 2013 that mention human annotator.

See all papers in Proc. ACL that mention human annotator.

Back to top.

learning algorithms

Appears in 3 sentences as: Learning Algorithms (1) learning algorithms (2)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Instead of using Mallet (McCallum, 2002) as a machine learning toolkit, we employ the Weka Data Mining Software (Hall et al., 2009) for classification, since it offers a wider range of state-of-the-art machine learning algorithms .
Page 7, “Experiments”
Learning Algorithms
Page 7, “Experiments”
We evaluated several learning algorithms from the Weka toolkit with respect to their performance on
Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

topic distribution

Appears in 3 sentences as: topic distribution (3)

In The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles.
Page 1, “Abstract”
It is rather necessary to determine reliable negative instances with a similar topic distribution as the set of positive instances in order to factor out the sampling bias.
Page 1, “Introduction”
In order to factor out the article topics as a major characteristic for distinguishing flawed articles from the set of outliers, reliable negative instances Are; have to be sampled from the restricted topic set Ampic that contains articles with a topic distribution similar to the flawed articles in A f (see Fig.
Page 5, “Selection of Reliable Training Instances”

See all papers in Proc. ACL 2013 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.