Part-of-speech tagging with antagonistic adversaries
Sogaard, Anders

Article Structure

Abstract

Supervised NLP tools and online services are often used on data that is very different from the manually annotated data used during development.

Introduction

Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed.

Robust perceptron learning

Our framework will be averaged perceptron learning (Freund and Schapire, 1999; Collins, 2002).

Learning with antagonistic adversaries

The intuition behind learning with antagonistic adversaries is that the adversary should focus on the most predictive features.

Experiments

We consider part-of-speech (POS) tagging, i.e.

Conclusion

We presented a discriminative learning algorithms for cross-domain structured prediction that seems more robust to covariate shifts than previous approaches.

Topics

perceptron

Appears in 10 sentences as: perceptron (10)
In Part-of-speech tagging with antagonistic adversaries
  1. We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets.
    Page 1, “Abstract”
  2. Section 2 introduces previous work on robust perceptron learning, as well as the methods dicussed in the paper.
    Page 2, “Introduction”
  3. Our framework will be averaged perceptron learning (Freund and Schapire, 1999; Collins, 2002).
    Page 2, “Robust perceptron learning”
  4. Learning with antagonistic adversaries performs significantly better than structured perceptron (SP) learning, L00 -regularization and LRA across the board.
    Page 4, “Experiments”
  5. We note that on the in-domain dataset (PTB-biomedical), L00 -regularization performs best, but our approach also performs better than the structured perceptron baseline on this dataset.
    Page 4, “Experiments”
  6. The number of zero weights or very small weights is significantly lower for learning with antagonistic adversaries than for the baseline structured perceptron .
    Page 4, “Experiments”
  7. In our PTB experiments, for example, the mean weight is 14.2 in structured perceptron learning, but 14.5 with antagonistic adversaries.
    Page 4, “Experiments”
  8. A last observation is that the structured perceptron baseline model expectedly fits the training data better than the robust models.
    Page 4, “Experiments”
  9. On CDT, the structured perceptron has an accuracy of 98.26% on held-out training data, whereas our model has an accuracy of only 97.85%.
    Page 4, “Experiments”
  10. Our approach was superior to previous approaches across 12 multilingual cross-domain POS tagging datasets, with an average error reduction of 4% over a structured perceptron baseline.
    Page 4, “Conclusion”

See all papers in Proc. ACL 2013 that mention perceptron.

See all papers in Proc. ACL that mention perceptron.

Back to top.

learning algorithms

Appears in 7 sentences as: learning algorithm (1) learning algorithms (6)
In Part-of-speech tagging with antagonistic adversaries
  1. Many discriminative learning algorithms are sensitive to such shifts because highly indicative features may swamp other indicative features.
    Page 1, “Abstract”
  2. Regularized and adversarial learning algorithms have been proposed to be more robust against covariate shifts.
    Page 1, “Abstract”
  3. We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets.
    Page 1, “Abstract”
  4. Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed.
    Page 1, “Introduction”
  5. Most discriminate learning algorithms only update parameters when training examples are misclassified.
    Page 2, “Introduction”
  6. All learning algorithms do the same number of passes over each training data set.
    Page 4, “Experiments”
  7. We presented a discriminative learning algorithms for cross-domain structured prediction that seems more robust to covariate shifts than previous approaches.
    Page 4, “Conclusion”

See all papers in Proc. ACL 2013 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

POS tagging

Appears in 7 sentences as: POS taggers (1) POS tagging (6)
In Part-of-speech tagging with antagonistic adversaries
  1. This paper considers the POS tagging problem, i.e.
    Page 1, “Introduction”
  2. Several authors have noted how POS tagging performance is sensitive to cross-domain shifts (Blitzer et al., 2006; Daume III, 2007; Jiang and Zhai, 2007), and while most authors have assumed known target distributions and pool unlabeled target data in order to automatically correct cross-domain bias (Jiang and Zhai, 2007; Foster et al., 2010), methods such as feature bagging (Sutton et al., 2006), learning with random adversaries (Globerson and Roweis, 2006) and LOO-regularization (Dekel and Shamir, 2008) have been proposed to improve performance on unknown target distributions.
    Page 1, “Introduction”
  3. Section 4 presents experiments on POS tagging and discusses how to evaluate cross-domain performance.
    Page 2, “Introduction”
  4. POS tagging accuracy is known to be very sensitive to domain shifts.
    Page 3, “Experiments”
  5. (2011) report a POS tagging accuracy on social media data of 84% using a tagger that ac-chieves an accuracy of about 97% on newspaper data.
    Page 3, “Experiments”
  6. While POS taggers can often recover the part of speech of a previously unseen word from the context it occurs in, this is harder than for previously seen words.
    Page 3, “Experiments”
  7. Our approach was superior to previous approaches across 12 multilingual cross-domain POS tagging datasets, with an average error reduction of 4% over a structured perceptron baseline.
    Page 4, “Conclusion”

See all papers in Proc. ACL 2013 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

model parameters

Appears in 4 sentences as: model parameters (4)
In Part-of-speech tagging with antagonistic adversaries
  1. Antagonistic adversaries choose transformations informed by the current model parameters w, but random adversaries randomly select transformations from a predefined set of possible transformations, e.g.
    Page 2, “Robust perceptron learning”
  2. In an online setting feature bagging can be modelled as a game between a learner and an adversary, in which (a) the adversary can only choose between deleting transformations, (b) the adversary cannot see model parameters when choosing a transformation, and in which (c) the adversary only moves in between passes over the data.1
    Page 2, “Robust perceptron learning”
  3. LRA is an adversarial game in which the two players are unaware of the other player’s current move, and in particular, where the adversary does not see model parameters and only randomly corrupts the data points.
    Page 2, “Robust perceptron learning”
  4. This is related to regularization in the following way: If model parameters are chosen to minimize expected error in the absence of any k features, we explicitly prevent under-weighting more than n — k features, i.e.
    Page 2, “Robust perceptron learning”

See all papers in Proc. ACL 2013 that mention model parameters.

See all papers in Proc. ACL that mention model parameters.

Back to top.

Treebank

Appears in 4 sentences as: Treebank (3) treebank (1)
In Part-of-speech tagging with antagonistic adversaries
  1. The first group comes from the English Web Treebank (EWT),4 also used in the Parsing the Web shared task (Petrov and McDonald, 2012).
    Page 3, “Experiments”
  2. We train our tagger on Sections 2—21 of the WSJ data in the Penn-III Treebank (PTB), Ontonotes 4.0 release.
    Page 3, “Experiments”
  3. Finally we do experiments with the Danish section of the Copenhagen Dependency Treebank (CDT).
    Page 3, “Experiments”
  4. For CDT we rely on the treebank meta-data and sin-
    Page 3, “Experiments”

See all papers in Proc. ACL 2013 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

labeled data

Appears in 3 sentences as: labeled data (3)
In Part-of-speech tagging with antagonistic adversaries
  1. The problem with out-of-vocabulary effects can be illustrated using a small labeled data set: {X1 = <1,<0,1,0>>,X2 I <1, <0,1,1>>,X3 = (0, <0,0,0>>,X4 = (1, (0,0, Say we train our model on X1_3 and evaluate it on the fourth data point.
    Page 2, “Introduction”
  2. Globerson and Roweis (2006) let an adversary corrupt labeled data during training to learn better models of test data with missing features.
    Page 2, “Robust perceptron learning”
  3. ting an antagonistic adversary corrupt our labeled data - somewhat surprisingly, maybe - leads to better cross-domain performance.
    Page 4, “Experiments”

See all papers in Proc. ACL 2013 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

part-of-speech

Appears in 3 sentences as: part-of-speech (3)
In Part-of-speech tagging with antagonistic adversaries
  1. We present a new perceptron learning algorithm using antagonistic adversaries and compare it to previous proposals on 12 multilingual cross-domain part-of-speech tagging datasets.
    Page 1, “Abstract”
  2. Most learning algorithms assume that training and test data are governed by identical distributions; and more specifically, in the case of part-of-speech (POS) tagging, that training and test sentences were sampled at random and that they are identically and independently distributed.
    Page 1, “Introduction”
  3. We consider part-of-speech (POS) tagging, i.e.
    Page 3, “Experiments”

See all papers in Proc. ACL 2013 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.