Linguistic Models for Analyzing and Detecting Biased Language
Recasens, Marta and Danescu-Niculescu-Mizil, Cristian and Jurafsky, Dan

Article Structure

Abstract

Unbiased language is a requirement for reference sources like encyclopedias and scientific texts.

Introduction

Writers and editors of reference works such as encyclopedias, textbooks, and scientific articles strive to keep their language unbiased.

Analyzing a Dataset of Biased Language

We begin with an empirical analysis based on Wikipedia’s bias-driven edits.

Automatically Identifying Biased Language

We now show how the bias cues identified in Section 2.3 can help solve a new task.

Human Perception of Biased Language

Is it difficult for humans to find the word in a sentence that induces bias, given the subtle, often implicit biases in Wikipedia.

Related Work

The work in this paper builds upon prior work on subjectivity detection (Wiebe et al., 2004; Lin et al., 2011; Conrad et al., 2012) and stance recognition (Yano et al., 2010; Somasundaran and Wiebe, 2010; Park et al., 2011), but applied to the genre of reference works such as Wikipedia.

Conclusions

Our study of bias in Wikipedia has implications for linguistic theory and computational linguistics.

Topics

turkers

Appears in 8 sentences as: Turkers (1) turkers (7) turkers’ (1)
In Linguistic Models for Analyzing and Detecting Biased Language
  1. Turkers were shown Wikipedia’s definition of a “biased statement” and two example sentences that illustrated the two types of bias, framing and epistemological.
    Page 7, “Human Perception of Biased Language”
  2. Before the 10 sentences, turkers were asked to list the languages they spoke as well as their primary language in primary school.
    Page 7, “Human Perception of Biased Language”
  3. On average, it took turkers about four minutes to complete each HIT.
    Page 7, “Human Perception of Biased Language”
  4. To ensure an answer quality as high as possible, we only kept those turkers who answered attentively by applying two filters: we only accepted answers that matched a valid word from the sentence, and we discarded answers from participants who did not
    Page 7, “Human Perception of Biased Language”
  5. Figure 1: Distribution of the number of turkers who selected the top word (i.e., the word selected by the majority of turkers ).
    Page 8, “Human Perception of Biased Language”
  6. These filters provided us with confidence in the turkers’ answers as a fair standard of comparison.
    Page 8, “Human Perception of Biased Language”
  7. For each sentence, we ranked the words according to the number of turkers (out of 10) who selected them and, like we did for the automated system, we assessed performance when considering only the top word (TOPl), the top 2 words (TOP2), and the top 3 words (TOP3).
    Page 8, “Human Perception of Biased Language”
  8. The turkers agreed 40.73% of the time, compared to the 5.1% chance agreement that would be achieved if raters had randomly selected a word for each sentence.
    Page 8, “Human Perception of Biased Language”

See all papers in Proc. ACL 2013 that mention turkers.

See all papers in Proc. ACL that mention turkers.

Back to top.

logistic regression

Appears in 5 sentences as: Logistic regression (2) logistic regression (3)
In Linguistic Models for Analyzing and Detecting Biased Language
  1. We trained a logistic regression model on a feature vector for every word that appears in the NPOV sentences from the training set, with the bias-inducing words as the positive class, and all the other words as the negative class.
    Page 5, “Automatically Identifying Biased Language”
  2. The types of features used in the logistic regression model are listed in Table 3, together with their value space.
    Page 5, “Automatically Identifying Biased Language”
  3. Logistic regression model that only uses the features based on Liu et al.’s (2005) lexicons of positive and negative words (i.e., features 26—29).
    Page 6, “Automatically Identifying Biased Language”
  4. Logistic regression model that only uses the features based on Riloff and Wiebe’s (2003) lexicon of subjective words (i.e., features 21—25).
    Page 6, “Automatically Identifying Biased Language”
  5. However, our logistic regression model reveals that epistemological and other features can usefully augment the traditional sentiment and subjectivity features for addressing the difficult task of identifying the bias-inducing word in a biased sentence.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

regression model

Appears in 5 sentences as: regression model (5)
In Linguistic Models for Analyzing and Detecting Biased Language
  1. We trained a logistic regression model on a feature vector for every word that appears in the NPOV sentences from the training set, with the bias-inducing words as the positive class, and all the other words as the negative class.
    Page 5, “Automatically Identifying Biased Language”
  2. The types of features used in the logistic regression model are listed in Table 3, together with their value space.
    Page 5, “Automatically Identifying Biased Language”
  3. Logistic regression model that only uses the features based on Liu et al.’s (2005) lexicons of positive and negative words (i.e., features 26—29).
    Page 6, “Automatically Identifying Biased Language”
  4. Logistic regression model that only uses the features based on Riloff and Wiebe’s (2003) lexicon of subjective words (i.e., features 21—25).
    Page 6, “Automatically Identifying Biased Language”
  5. However, our logistic regression model reveals that epistemological and other features can usefully augment the traditional sentiment and subjectivity features for addressing the difficult task of identifying the bias-inducing word in a biased sentence.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2013 that mention regression model.

See all papers in Proc. ACL that mention regression model.

Back to top.

sentiment lexicons

Appears in 3 sentences as: sentiment lexicon (1) sentiment lexicons (2)
In Linguistic Models for Analyzing and Detecting Biased Language
  1. The features used by these models include subjectivity and sentiment lexicons , counts of unigrams and bigrams, distributional similarity, discourse relationships, and so on.
    Page 4, “Analyzing a Dataset of Biased Language”
  2. Of the 654 words included in this lexicon, 433 were unique to this lexicon (i.e., recorded in neither Riloff and Wiebe’s (2003) subjectivity lexicon nor Liu et al.’s (2005) sentiment lexicon ) and represented many one-sided or controversial terms, e.g., abortion, same-sex, execute.
    Page 5, “Automatically Identifying Biased Language”
  3. To this end, the features based on subjectivity and sentiment lexicons turn out to be helpful, and incorporating more features for stance detection is an important direction for future work.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention sentiment lexicons.

See all papers in Proc. ACL that mention sentiment lexicons.

Back to top.