Robust Logistic Regression using Shift Parameters
Tibshirani, Julie and Manning, Christopher D.

Article Structure

Abstract

Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels.

Introduction

Almost any large dataset has annotation errors, especially those complex, nuanced datasets commonly used in natural language processing.

Related Work

Much of the previous work on dealing with annotation errors centers around filtering the data before training.

Model

Recall that in binary logistic regression, the probability of an example :10, being positive is modeled as

Experiments

We conduct two sets of experiments to assess the effectiveness of the approach, in terms of both identifying mislabelled examples and producing accurate predictions.

Future Work

A natural direction for future work is to extend the model to a multi-class setting.

Topics

logistic regression

Appears in 17 sentences as: logistic regression (17)
In Robust Logistic Regression using Shift Parameters
  1. In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective.
    Page 1, “Abstract”
  2. This model can be trained through nearly the same means as logistic regression , and retains its efficiency on high-dimensional datasets.
    Page 1, “Abstract”
  3. In this work we argue that incorrect examples should be explicitly modelled during training, and present a simple extension of logistic regression that incorporates the possibility of mislabelling directly into the objective.
    Page 1, “Introduction”
  4. It has a convex objective, is well-suited to high-dimensional data, and can be efficiently trained with minimal changes to the logistic regression pipeline.
    Page 1, “Introduction”
  5. In experiments on a large, noisy NER dataset, we find that this method can provide an improvement over standard logistic regression when annotation errors are present.
    Page 1, “Introduction”
  6. This robust extension of logistic regression shows particular promise for NLP applications: it helps account for incorrect labels, while remaining efficient on large, high-dimensional datasets.
    Page 1, “Introduction”
  7. Bootkrajang and Kaban (2012) present an extension of logistic regression that models annotation errors through flipping probabilities.
    Page 2, “Related Work”
  8. In this work we adapt the technique to logistic regression .
    Page 2, “Related Work”
  9. To the best of our knowledge, we are the first to experiment with adding ‘shift parameters’ to logistic regression and demonstrate that the model is especially well-suited to the type of high-dimensional, noisy datasets commonly used in NLP.
    Page 2, “Related Work”
  10. Recall that in binary logistic regression , the probability of an example :10, being positive is modeled as
    Page 2, “Model”
  11. Upon writing the objective in this way, we immediately see that it is convex, just as standard L1-penalized logistic regression is convex.
    Page 3, “Model”

See all papers in Proc. ACL 2014 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

NER

Appears in 6 sentences as: NER (6)
In Robust Logistic Regression using Shift Parameters
  1. In experiments on a large, noisy NER dataset, we find that this method can provide an improvement over standard logistic regression when annotation errors are present.
    Page 1, “Introduction”
  2. We now consider the problem of named entity recognition ( NER ) to evaluate how our model performs in a large-scale prediction task.
    Page 4, “Experiments”
  3. In traditional NER , the goal is to determine whether each word is a person, organization, location, or not a named entity (‘other’).
    Page 4, “Experiments”
  4. For training, we use a large, noisy NER dataset collected by Jenny Finkel.
    Page 4, “Experiments”
  5. We extract a set of features using Stanford’s NER pipeline (Finkel et al., 2005).
    Page 4, “Experiments”
  6. Table 2: Performance of standard vs. robust logistic regression in the Wikipedia NER experiment.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

named entity

Appears in 4 sentences as: named entities (1) named entity (3)
In Robust Logistic Regression using Shift Parameters
  1. We conduct experiments on named entity recognition data and find that our approach can provide a significant improvement over the standard model when annotation errors are present.
    Page 1, “Abstract”
  2. We now consider the problem of named entity recognition (NER) to evaluate how our model performs in a large-scale prediction task.
    Page 4, “Experiments”
  3. In traditional NER, the goal is to determine whether each word is a person, organization, location, or not a named entity (‘other’).
    Page 4, “Experiments”
  4. (This task does not trivially reduce to finding the capitalized words, as the model must distinguish between people and other named entities like organizations).
    Page 4, “Experiments”

See all papers in Proc. ACL 2014 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

Amazon Mechanical Turk

Appears in 3 sentences as: Amazon Mechanical Turk (2) Amazon Mechanical Turkers (1)
In Robust Logistic Regression using Shift Parameters
  1. Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels.
    Page 1, “Abstract”
  2. Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk , as well as methods like distant supervision and co-training that involve automatically generating training data.
    Page 1, “Introduction”
  3. The data was created by taking various Wikipedia articles and giving them to five Amazon Mechanical Turkers to annotate.
    Page 4, “Experiments”

See all papers in Proc. ACL 2014 that mention Amazon Mechanical Turk.

See all papers in Proc. ACL that mention Amazon Mechanical Turk.

Back to top.

distant supervision

Appears in 3 sentences as: distant supervision (3)
In Robust Logistic Regression using Shift Parameters
  1. Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels.
    Page 1, “Abstract”
  2. Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk, as well as methods like distant supervision and co-training that involve automatically generating training data.
    Page 1, “Introduction”
  3. Although small amounts of noise may not be detrimental, in some applications the level can be high: upon manually inspecting a relation extraction corpus commonly used in distant supervision , Riedel et al.
    Page 1, “Introduction”

See all papers in Proc. ACL 2014 that mention distant supervision.

See all papers in Proc. ACL that mention distant supervision.

Back to top.

Mechanical Turk

Appears in 3 sentences as: Mechanical Turk (2) Mechanical Turkers (1)
In Robust Logistic Regression using Shift Parameters
  1. Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels.
    Page 1, “Abstract”
  2. Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk , as well as methods like distant supervision and co-training that involve automatically generating training data.
    Page 1, “Introduction”
  3. The data was created by taking various Wikipedia articles and giving them to five Amazon Mechanical Turkers to annotate.
    Page 4, “Experiments”

See all papers in Proc. ACL 2014 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.