Abstract | We propose a semi-supervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators , while all others in the selected sequences are automatically labeled. |
Active Learning for Sequence Labeling | loop until stopping criterion is met 1. learn model M from L 2. for allpi E P : up, <— UM(p¢) 3. select B examples p,- E P with highest utility up, 4. query human annotator for labels of all B examples 5. move newly labeled examples from P to L |
Active Learning for Sequence Labeling | 2Sequences of consecutive tokens azj for which C; ) g t are presented to the human annotator instead of single, isolated tokens. |
Introduction | the need for human annotators to supply large amounts of “golden” annotation data on which ML systems can be trained. |
Introduction | To further exploit this observation for annotation purposes, we here propose an approach to AL where human annotators are required to label only uncertain subsequences within the selected sentences, while the remaining subsequences are labeled automatically based on the model available from the previous AL iteration round. |
Related Work | In contrast, our SeSAL approach which also applies bootstrapping, aims at avoiding to deteriorate data quality by explicitly pointing human annotators to classification-critical regions. |
Summary and Discussion | We here hypothesize that human annotators work much more efficiently when pointed to the regions of immediate interest instead of making them skim in a self-paced way through larger passages of (probably) semantically irrelevant but syntactically complex utterances —a tiring and error-prone task. |
Conclusions and Future Work | Human annotations on 5000 automatically annotated queries showed that our data generation method is highly accurate, achieving 84.3% accuracy on average for Category-l queries, and 93.7% accuracy for Category-l and Category-2 queries combined. |
Introduction | Furthermore, creating such a data set is expensive as it requires an extensive amount of work by human annotators . |
Language Identification | We tested all the systems in this section on a test set of 3500 human annotated queries, which is formed by taking 350 Category-1 queries from each language. |
Language Identification | This query is labelled as Category-2 by the human annotator . |
Abstract | It is often desirable to reduce the quantity of training data — and hence human annotation — that is needed to train an L2P classifier for a new language. |
Active learning | Words for which the prediction confidence is above a certain threshold are immediately added to the lexicon, while the remaining words must be verified (and corrected, if necessary) by a human annotator . |
Active learning | In the L2P domain, we assume that a human annotator specifies the phonemes for an entire word, and that the active learner cannot query individual letters. |