Semi-Supervised Cause Identification from Aviation Safety Reports
Persing, Isaac and Ng, Vincent

Article Structure

Abstract

We introduce cause identification, a new problem involving classification of incident reports in the aviation domain.

Introduction

Automatic text classification is one of the most important applications in natural language processing (NLP).

Shaping Factors

As mentioned in the introduction, the task of cause identification involves labeling an incident report with all the shaping factors that contributed to the occurrence of the incident.

Dataset

We downloaded our corpus from the ASRS website4.

Baseline Approaches

In this section, we describe two baseline approaches to cause identification.

Our Bootstrapping Algorithm

One of the potential weaknesses of the two baselines described in the previous section is that the classifiers are trained on only a small amount of labeled data.

Evaluation

6.1 Baseline Systems

Related Work

Since we recast cause identification as a text classification task and proposed a bootstrapping approach that targets at improving minority class prediction, the work most related to ours involves one or both of these topics.

Conclusions

We have introduced a new problem, cause identification from aviation safety reports, to the NLP community.

Topics

F-measure

Appears in 16 sentences as: F-measure (16)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data.
    Page 1, “Abstract”
  2. In comparison to a supervised baseline approach where a classifier is acquired solely based on the training set, our bootstrapping approach yields a relative error reduction of 6.3% in F-measure for the minority classes.
    Page 2, “Introduction”
  3. Results are reported in terms of precision (P), recall (R), and F-measure (F), which are computed by aggregating over the 14 shapers as follows.
    Page 4, “Baseline Approaches”
  4. Our second baseline is similar to the first, except that we tune the classification threshold (CT) to optimize F-measure .
    Page 4, “Baseline Approaches”
  5. Using the development data, we tune the 14 CTs jointly to optimize overall F-measure .
    Page 5, “Baseline Approaches”
  6. Consequently, we find a local maximum by employing a local search algorithm, which alters one parameter at a time to optimize F-measure by holding the remaining parameters fixed.
    Page 5, “Baseline Approaches”
  7. In particular, if the second baseline is used, we will tune CT and k jointly on the development data using the local search algorithm described previously, where we adjust the values of both CT and k for one of the 14 classifiers in each step of the search process to optimize the overall F-measure score.
    Page 6, “Our Bootstrapping Algorithm”
  8. Micro-averaged 5-fold cross validation results of this baseline for all 14 shapers and for just 10 minority classes (due to our focus on improving minority class prediction) are expressed as percentages in terms of precision (P), recall (R), and F-measure (F) in the first row of Table 4.
    Page 6, “Evaluation”
  9. As we can see, the baseline achieves an F-measure of 45.4 (14 shapers) and 35.4 (10 shapers).
    Page 6, “Evaluation”
  10. Comparing these two results, the higher F-measure achieved using all 14 shapers can be attributed primarily to improvements in recall.
    Page 6, “Evaluation”
  11. In comparison to the first baseline, we see that F-measure improves considerably by 7.4% and 4.5% for 14 shapers and 10 shapers respectively8, which illus-
    Page 6, “Evaluation”

See all papers in Proc. ACL 2009 that mention F-measure.

See all papers in Proc. ACL that mention F-measure.

Back to top.

labeled data

Appears in 12 sentences as: labeled data (13)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data.
    Page 1, “Abstract”
  2. Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data .
    Page 1, “Abstract”
  3. The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
    Page 1, “Introduction”
  4. Such methods, however, are unlikely to perform equally well for our cause identification task given our small labeled set, as the minority class prediction problem is complicated by the scarcity of labeled data .
    Page 2, “Introduction”
  5. More specifically, given the scarcity of labeled data , many words that are potentially correlated with a shaper (especially a minority shaper) may not appear in the training set, and the lack of such useful indicators could hamper the acquisition of an accurate classifier via supervised learning techniques.
    Page 2, “Introduction”
  6. sideration, and (2) augment the labeled data by using the resulting words to annotate those unlabeled reports that can be confidently labeled.
    Page 2, “Introduction”
  7. mate goal is to evaluate the effectiveness of our bootstrapping algorithm, the baseline approaches only make use of small amounts of labeled data for acquiring classifiers.
    Page 4, “Baseline Approaches”
  8. To ensure a fair comparison with the first baseline, we do not employ additional labeled data for parameter tuning; rather, we reserve 25% of the available training data for tuning, and use the remaining 75% for classifier
    Page 4, “Baseline Approaches”
  9. One of the potential weaknesses of the two baselines described in the previous section is that the classifiers are trained on only a small amount of labeled data .
    Page 5, “Our Bootstrapping Algorithm”
  10. The situation is somewhat aggravated by the fact that we are adopting a one-versus-all scheme for generating training instances for a particular shaper, which, together with the small amount of labeled data , implies that only a couple of positive instances may be available for training the classifier for a minority class.
    Page 5, “Our Bootstrapping Algorithm”
  11. The reason we impose the “at least three” requirement is precision: we want to ensure, with a reasonable level of confidence, that the unlabeled documents chosen to augment P should indeed be labeled with the shaper under consideration, as incorrectly labeled documents would contaminate the labeled data, thus accelerating the deterioration of the quality of the automatically labeled data in subsequent bootstrapping iterations and adversely affecting the accuracy of the classifier trained on it (Pierce and Cardie, 2001).
    Page 6, “Our Bootstrapping Algorithm”

See all papers in Proc. ACL 2009 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

text classification

Appears in 12 sentences as: text classification (12)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. Automatic text classification is one of the most important applications in natural language processing (NLP).
    Page 1, “Introduction”
  2. The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
    Page 1, “Introduction”
  3. In this paper, we introduce a new text classification problem involving the Aviation Safety Reporting System (ASRS) that can be viewed as a difficult task along each of the five dimensions discussed above.
    Page 1, “Introduction”
  4. Hence, cause identification can be naturally recast as a text classification task: given an incident report, determine which of a set of 14 shapers contributed to the occurrence of the incident described in the report.
    Page 1, “Introduction”
  5. 2Of course, the fact that sentiment classification requires a deeper understanding of a text also makes it more difficult than topic-based text classification (Pang et a1., 2002).
    Page 1, “Introduction”
  6. First, we introduce a new, challenging text classification problem, cause identification from aviation safety reports, to the NLP community.
    Page 2, “Introduction”
  7. Unlike newswire articles, at which many topic-based text classification tasks are targeted, the ASRS reports are informally written using various domain-specific abbreviations and acronyms, tend to contain poor grammar, and have capitalization information removed, as illustrated in the following sentence taken from one of the reports.
    Page 3, “Dataset”
  8. The SVM learning algorithm as implemented in the LIB SVM software package (Chang and Lin, 2001) is used for classifier training, owing to its robust performance on many text classification tasks.
    Page 4, “Baseline Approaches”
  9. Since we recast cause identification as a text classification task and proposed a bootstrapping approach that targets at improving minority class prediction, the work most related to ours involves one or both of these topics.
    Page 8, “Related Work”
  10. (2007) address the problem of class skewness in text classification .
    Page 8, “Related Work”
  11. Similar bootstrapping methods are applicable outside text classification as well.
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

classification task

Appears in 6 sentences as: classification task (4) classification tasks (2)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
    Page 1, “Introduction”
  2. Hence, cause identification can be naturally recast as a text classification task : given an incident report, determine which of a set of 14 shapers contributed to the occurrence of the incident described in the report.
    Page 1, “Introduction”
  3. Unlike newswire articles, at which many topic-based text classification tasks are targeted, the ASRS reports are informally written using various domain-specific abbreviations and acronyms, tend to contain poor grammar, and have capitalization information removed, as illustrated in the following sentence taken from one of the reports.
    Page 3, “Dataset”
  4. The SVM learning algorithm as implemented in the LIB SVM software package (Chang and Lin, 2001) is used for classifier training, owing to its robust performance on many text classification tasks .
    Page 4, “Baseline Approaches”
  5. Since we recast cause identification as a text classification task and proposed a bootstrapping approach that targets at improving minority class prediction, the work most related to ours involves one or both of these topics.
    Page 8, “Related Work”
  6. We recast it as a multi-class, multi-label text classification task , and presented a bootstrapping algorithm for improving the prediction of minority classes in the presence of a small training set.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2009 that mention classification task.

See all papers in Proc. ACL that mention classification task.

Back to top.

cross validation

Appears in 5 sentences as: cross validation (5)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. Since we will perform 5-fold cross validation in our experiments, we also show the number of reports labeled with each shaper under the “F” columns for each fold.
    Page 4, “Dataset”
  2. This amounts to using three folds for training and one fold for development in each cross validation experiment.
    Page 5, “Baseline Approaches”
  3. Whichever baseline is used, we need to reserve one of the five folds to tune the parameter k in our cross validation experiments.
    Page 6, “Our Bootstrapping Algorithm”
  4. Micro-averaged 5-fold cross validation results of this baseline for all 14 shapers and for just 10 minority classes (due to our focus on improving minority class prediction) are expressed as percentages in terms of precision (P), recall (R), and F-measure (F) in the first row of Table 4.
    Page 6, “Evaluation”
  5. Table 4: 5-fold cross validation results.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2009 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.

binary classification

Appears in 4 sentences as: binary classification (3) binary classifier (1)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. Second, the fact that this is a 14-class classification problem makes it more challenging than a binary classification problem.
    Page 2, “Introduction”
  2. More specifically, both baselines recast the cause identification problem as a set of 14 binary classification problems, one for predicting each shaper.
    Page 4, “Baseline Approaches”
  3. In the binary classification problem for predicting shaper 3,, we create one training instance from each document in the training set, labeling the instance as positive if the document has 3,- as one of its labels, and negative otherwise.
    Page 4, “Baseline Approaches”
  4. After creating training instances, we train a binary classifier , Ci, for predicting 3i, employing as features the top 50 unigrams that are selected according to information gain computed over the training data (see Yang and Pedersen (1997)).
    Page 4, “Baseline Approaches”

See all papers in Proc. ACL 2009 that mention binary classification.

See all papers in Proc. ACL that mention binary classification.

Back to top.

unlabeled data

Appears in 4 sentences as: unlabeled data (4)
In Semi-Supervised Cause Identification from Aviation Safety Reports
  1. To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data .
    Page 1, “Abstract”
  2. To alleviate the data scarcity problem and improve the accuracy of the classifiers, we propose in this section a bootstrapping algorithm that automatically augments a training set by exploiting a large amount of unlabeled data .
    Page 5, “Our Bootstrapping Algorithm”
  3. To get a sense of the accuracy of the bootstrapped documents without further manual labeling, recall that our experimental setup resembles a transductive setting where the test documents are part of the unlabeled data , and consequently, some of them may have been automatically labeled by the bootstrapping algorithm.
    Page 7, “Evaluation”
  4. Minority classes can be expanded without the availability of unlabeled data as well.
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.