Semi-Supervised Active Learning for Sequence Labeling
Tomanek, Katrin and Hahn, Udo

Article Structure

Abstract

While Active Learning (AL) has already been shown to markedly reduce the annotation efforts for many sequence labeling tasks compared to random selection, AL remains unconcerned about the internal structure of the selected sequences (typically, sentences).

Introduction

Supervised machine learning (ML) approaches are currently the methodological backbone for lots of NLP activities.

Conditional Random Fields for Sequence Labeling

Many NLP tasks, such as POS tagging, chunking, or NER, are sequence labeling problems where a sequence of class labels 3] = (3/1,.

Active Learning for Sequence Labeling

AL is a selective sampling technique where the learning protocol is in control of the data to be used for training.

Related Work

Common approaches to AL are variants of the Query-By-Committee approach (Seung et al., 1992) or based on uncertainty sampling (Lewis and Catlett, 1994).

Experiments and Results

In this section, we turn to the empirical assessment of semi-supervised AL (SeSAL) for sequence labeling on the NLP task of named entity recognition.

Summary and Discussion

Our experiments in the context of the NER scenario render evidence to the hypothesis that the proposed approach to semi-supervised AL (SeSAL) for sequence labeling indeed strongly reduces the amount of tokens to be manually annotated — in terms of numbers, about 60% compared to its fully supervised counterpart (FuSAL), and over 80% compared to a totally passive learning scheme based on random selection.

Topics

semi-supervised

Appears in 16 sentences as: Semi-Supervised (2) Semi-supervised (1) semi-supervised (13)
In Semi-Supervised Active Learning for Sequence Labeling
  1. We propose a semi-supervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators, while all others in the selected sequences are automatically labeled.
    Page 1, “Abstract”
  2. Accordingly, our approach is a combination of AL and self-training to which we will refer as semi-supervised Active Learning (SeSAL) for sequence labeling.
    Page 1, “Introduction”
  3. After a brief overview of the formal underpinnings of Conditional Random Fields, our base classifier for sequence labeling tasks (Section 2), a fully supervised approach to AL for sequence labeling is introduced and complemented by our semi-supervised approach in Section 3.
    Page 1, “Introduction”
  4. Our experiments are laid out in Section 5 where we compare fully and semi-supervised AL for NER on two corpora, the newspaper selection of MUC7 and PENNBIOIE, a biological abstracts corpus.
    Page 1, “Introduction”
  5. This section, first, describes a common approach to AL for sequential data, and then presents our approach to semi-supervised AL.
    Page 2, “Active Learning for Sequence Labeling”
  6. 3.2 Semi-Supervised Active Learning
    Page 3, “Active Learning for Sequence Labeling”
  7. We call this semi-supervised Active Learning (SeSAL) for sequence labeling.
    Page 3, “Active Learning for Sequence Labeling”
  8. Self-training (Yarowsky, 1995) is a form of semi-supervised learning.
    Page 4, “Related Work”
  9. A combination of active and semi-supervised learning has first been proposed by McCallum and Nigam (1998) for text classification.
    Page 4, “Related Work”
  10. Similarly, co-testing (Muslea et al., 2002), a multi-view AL algorithms, selects examples for the multi-view, semi-supervised Co-EM algorithm.
    Page 4, “Related Work”
  11. Our approach to semi-supervised AL is different as, firstly, we augment the training data using a self-tagging mechanism (McCallum and Nigam (1998) and Muslea et al.
    Page 4, “Related Work”

See all papers in Proc. ACL 2009 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

sequence labeling

Appears in 16 sentences as: sequence labeling (17)
In Semi-Supervised Active Learning for Sequence Labeling
  1. While Active Learning (AL) has already been shown to markedly reduce the annotation efforts for many sequence labeling tasks compared to random selection, AL remains unconcerned about the internal structure of the selected sequences (typically, sentences).
    Page 1, “Abstract”
  2. We propose a semi-supervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators, while all others in the selected sequences are automatically labeled.
    Page 1, “Abstract”
  3. When used for sequence labeling tasks such as POS tagging, chunking, or named entity recogni-
    Page 1, “Introduction”
  4. Approaches to AL for sequence labeling are usually unconcerned about the internal structure of the selected sequences.
    Page 1, “Introduction”
  5. Accordingly, our approach is a combination of AL and self-training to which we will refer as semi-supervised Active Learning (SeSAL) for sequence labeling .
    Page 1, “Introduction”
  6. After a brief overview of the formal underpinnings of Conditional Random Fields, our base classifier for sequence labeling tasks (Section 2), a fully supervised approach to AL for sequence labeling is introduced and complemented by our semi-supervised approach in Section 3.
    Page 1, “Introduction”
  7. Many NLP tasks, such as POS tagging, chunking, or NER, are sequence labeling problems where a sequence of class labels 3] = (3/1,.
    Page 2, “Conditional Random Fields for Sequence Labeling”
  8. In the sequence labeling scenario, such an example is a stream of linguistic items — a sentence is usually considered as proper sequence unit.
    Page 3, “Active Learning for Sequence Labeling”
  9. In the sequence labeling scenario, an example which, as a whole, has a high utility U X03), can still exhibit subsequences which do not add much to the overall utility and thus are fairly easy for the current model to label correctly.
    Page 3, “Active Learning for Sequence Labeling”
  10. 1There are many more sophisticated utility functions for sequence labeling .
    Page 3, “Active Learning for Sequence Labeling”
  11. For the sequence labeling scenario, we accordingly modify the fully supervised AL approach from Section 3.1.
    Page 3, “Active Learning for Sequence Labeling”

See all papers in Proc. ACL 2009 that mention sequence labeling.

See all papers in Proc. ACL that mention sequence labeling.

Back to top.

NER

Appears in 9 sentences as: NER (9)
In Semi-Supervised Active Learning for Sequence Labeling
  1. tion ( NER ), the examples selected by AL are sequences of text, typically sentences.
    Page 1, “Introduction”
  2. In the NER scenario, e.g., large portions of the text do not contain any target entity mention at all.
    Page 1, “Introduction”
  3. Our experiments are laid out in Section 5 where we compare fully and semi-supervised AL for NER on two corpora, the newspaper selection of MUC7 and PENNBIOIE, a biological abstracts corpus.
    Page 1, “Introduction”
  4. Many NLP tasks, such as POS tagging, chunking, or NER , are sequence labeling problems where a sequence of class labels 3] = (3/1,.
    Page 2, “Conditional Random Fields for Sequence Labeling”
  5. This might, in particular, apply to NER where larger stretches of sentences do not contain any entity mention at all, or merely trivial instances of an entity class easily predictable by the current model.
    Page 3, “Active Learning for Sequence Labeling”
  6. This coincides with the assumption that SeSAL works especially well for labeling tasks where some classes occur predominantly and can, in most cases, easily be discriminated from the other classes, as is the case in the NER scenario.
    Page 7, “Experiments and Results”
  7. Our experiments in the context of the NER scenario render evidence to the hypothesis that the proposed approach to semi-supervised AL (SeSAL) for sequence labeling indeed strongly reduces the amount of tokens to be manually annotated — in terms of numbers, about 60% compared to its fully supervised counterpart (FuSAL), and over 80% compared to a totally passive learning scheme based on random selection.
    Page 8, “Summary and Discussion”
  8. In our experiments on the NER scenario, those regions were mentions of entity names or linguistic units which had a surface appearance similar to entity mentions but could not yet be correctly distinguished by the model.
    Page 8, “Summary and Discussion”
  9. Future research is needed to empirically investigate into this area and quantify the savings in terms of the time achievable with SeSAL in the NER scenario.
    Page 8, “Summary and Discussion”

See all papers in Proc. ACL 2009 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

human annotators

Appears in 7 sentences as: human annotator (2) human annotators (5)
In Semi-Supervised Active Learning for Sequence Labeling
  1. We propose a semi-supervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators , while all others in the selected sequences are automatically labeled.
    Page 1, “Abstract”
  2. the need for human annotators to supply large amounts of “golden” annotation data on which ML systems can be trained.
    Page 1, “Introduction”
  3. To further exploit this observation for annotation purposes, we here propose an approach to AL where human annotators are required to label only uncertain subsequences within the selected sentences, while the remaining subsequences are labeled automatically based on the model available from the previous AL iteration round.
    Page 1, “Introduction”
  4. loop until stopping criterion is met 1. learn model M from L 2. for allpi E P : up, <— UM(p¢) 3. select B examples p,- E P with highest utility up, 4. query human annotator for labels of all B examples 5. move newly labeled examples from P to L
    Page 3, “Active Learning for Sequence Labeling”
  5. 2Sequences of consecutive tokens azj for which C; ) g t are presented to the human annotator instead of single, isolated tokens.
    Page 3, “Active Learning for Sequence Labeling”
  6. In contrast, our SeSAL approach which also applies bootstrapping, aims at avoiding to deteriorate data quality by explicitly pointing human annotators to classification-critical regions.
    Page 4, “Related Work”
  7. We here hypothesize that human annotators work much more efficiently when pointed to the regions of immediate interest instead of making them skim in a self-paced way through larger passages of (probably) semantically irrelevant but syntactically complex utterances —a tiring and error-prone task.
    Page 8, “Summary and Discussion”

See all papers in Proc. ACL 2009 that mention human annotators.

See all papers in Proc. ACL that mention human annotators.

Back to top.

F-score

Appears in 5 sentences as: F-score (6)
In Semi-Supervised Active Learning for Sequence Labeling
  1. Table 2 depicts the exact numbers of manually labeled tokens to reach the maximal (supervised) F-score on both corpora.
    Page 6, “Experiments and Results”
  2. On the MUC7 corpus, FuSAL requires 7,374 annotated NPs to yield an F-score of 87%, While SeSAL hit the same F-score with only 4,017 NPs.
    Page 6, “Experiments and Results”
  3. 5 On PENNBIOIE, SeSAL also saves about 45 % compared to FuSAL to achieve an F-score of 81 %.
    Page 6, “Experiments and Results”
  4. The supervised F-score of 87.7 % is only reached by the highest and most restrictive threshold of t = 0.99.
    Page 7, “Experiments and Results”
  5. However, in combination with lower thresholds, the delay rates show positive effects as SeSAL yields F-scores closer to the maximal F-score of 87.7%, thus clearly outperforming undelayed SeSAL.
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2009 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

manual annotation

Appears in 5 sentences as: manual annotation (3) manually annotated (2)
In Semi-Supervised Active Learning for Sequence Labeling
  1. In most annotation campaigns, the language material chosen for manual annotation is selected randomly from some reference corpus.
    Page 1, “Introduction”
  2. In the AL paradigm, only examples of high training utility are selected for manual annotation in an iterative manner.
    Page 1, “Introduction”
  3. If Cflyj) exceeds a certain confidence threshold t, is assumed to be the correct label for this token and assigned to it.2 Otherwise, manual annotation of this token is required.
    Page 3, “Active Learning for Sequence Labeling”
  4. So, using SeSAL the complete corpus can be labeled with only a small fraction of it actually being manually annotated (MUC7: about 18 %, PENNBIOIE: about 13 %).
    Page 6, “Experiments and Results”
  5. Our experiments in the context of the NER scenario render evidence to the hypothesis that the proposed approach to semi-supervised AL (SeSAL) for sequence labeling indeed strongly reduces the amount of tokens to be manually annotated — in terms of numbers, about 60% compared to its fully supervised counterpart (FuSAL), and over 80% compared to a totally passive learning scheme based on random selection.
    Page 8, “Summary and Discussion”

See all papers in Proc. ACL 2009 that mention manual annotation.

See all papers in Proc. ACL that mention manual annotation.

Back to top.

confidence score

Appears in 4 sentences as: confidence score (2) Confidence Scores (1) confidence scores (1)
In Semi-Supervised Active Learning for Sequence Labeling
  1. Distribution of Confidence Scores .
    Page 5, “Experiments and Results”
  2. The vast majority of tokens has a confidence score close to l, the median lies at 0.9966.
    Page 5, “Experiments and Results”
  3. confidence score
    Page 6, “Experiments and Results”
  4. Figure 1: Distribution of token-level confidence scores in the 5th iteration of FuSAL on MUC7 (number of tokens: 1,843)
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2009 that mention confidence score.

See all papers in Proc. ACL that mention confidence score.

Back to top.

entity mention

Appears in 4 sentences as: entity mention (2) entity mentions (2)
In Semi-Supervised Active Learning for Sequence Labeling
  1. In the NER scenario, e.g., large portions of the text do not contain any target entity mention at all.
    Page 1, “Introduction”
  2. This might, in particular, apply to NER where larger stretches of sentences do not contain any entity mention at all, or merely trivial instances of an entity class easily predictable by the current model.
    Page 3, “Active Learning for Sequence Labeling”
  3. By the nature of this task, the sequences —in this case, sentences — are only sparsely populated with entity mentions and most of the tokens belong to the OUTSIDE class3 so that SeSAL can be expected to be very beneficial.
    Page 5, “Experiments and Results”
  4. In our experiments on the NER scenario, those regions were mentions of entity names or linguistic units which had a surface appearance similar to entity mentions but could not yet be correctly distinguished by the model.
    Page 8, “Summary and Discussion”

See all papers in Proc. ACL 2009 that mention entity mention.

See all papers in Proc. ACL that mention entity mention.

Back to top.

conditional probability

Appears in 3 sentences as: conditional probabilities (1) conditional probability (2)
In Semi-Supervised Active Learning for Sequence Labeling
  1. See Equation (1) for the conditional probability PX(37 with ZX calculated as in Equation (6).
    Page 2, “Conditional Random Fields for Sequence Labeling”
  2. The marginal and conditional probabilities are used by our AL approaches as confidence estimators.
    Page 2, “Conditional Random Fields for Sequence Labeling”
  3. We apply CRFs as our base learner throughout this paper and employ a utility function which is based on the conditional probability of the most likely label sequence 37* for an observation sequence :8 (cf.
    Page 3, “Active Learning for Sequence Labeling”

See all papers in Proc. ACL 2009 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

POS tagging

Appears in 3 sentences as: POS tagging (2) POS tags (1)
In Semi-Supervised Active Learning for Sequence Labeling
  1. When used for sequence labeling tasks such as POS tagging , chunking, or named entity recogni-
    Page 1, “Introduction”
  2. Many NLP tasks, such as POS tagging , chunking, or NER, are sequence labeling problems where a sequence of class labels 3] = (3/1,.
    Page 2, “Conditional Random Fields for Sequence Labeling”
  3. Input units 553- are usually tokens, class labels yj can be POS tags or entity classes.
    Page 2, “Conditional Random Fields for Sequence Labeling”

See all papers in Proc. ACL 2009 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.