Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
Szarvas, Gy"orgy

Article Structure

Abstract

Since facts or statements in a hedge or negated context typically appear as false positives, the proper handling of these language phenomena is of great importance in biomedical text mining.

Introduction

The highly accurate identification of several regularly occurring language phenomena like the speculative use of language, negation and past tense (temporal resolution) is a prerequisite for the efficient processing of biomedical texts.

Methods

2.1 Feature space representation

Results

In this section we will present our results for hedge classification as a standalone task.

Conclusions

The overall results of our study are summarised in a concise way in Table 2.

Topics

maxent

Appears in 8 sentences as: maxent (8)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. This is performed by weighting features to maximise the likelihood of data and, for each instance, decisions are made based on features present at that point, thus maxent classification is quite suitable for our purposes.
    Page 4, “Methods”
  2. As feature weights are mutually estimated, the maxent classifier is capable of taking feature dependence into account.
    Page 4, “Methods”
  3. By downweighting such features, maxent is capable of modelling to a certain extent the special characteristics which arise from the automatic or weakly supervised training data acquisition procedure.
    Page 4, “Methods”
  4. We used the OpenNLP maxent package, which is freely available4.
    Page 4, “Methods”
  5. Here we decided not to check whether these keywords made sense in scientific texts or not, but instead left this task to the maximum entropy classifier, and added only those keywords that were found reliable enough to predict spec label alone by the maxent model trained on the training dataset.
    Page 5, “Results”
  6. This 54—keyword maxent classifier got an F5=1(spec) score of 79.73%.
    Page 6, “Results”
  7. We manually examined all keywords that had a P(spec) > 0.5 given as a standalone instance for our maxent model, and constructed a dictionary of hedge cues from the promising candidates.
    Page 7, “Results”
  8. Extending the dictionary with the keywords we gathered from the fruit fly dataset increased the F5=1(spec) score to 82.07% with only one out-domain keyword accepted by the maxent classifier.
    Page 7, “Results”

See all papers in Proc. ACL 2008 that mention maxent.

See all papers in Proc. ACL that mention maxent.

Back to top.

Maximum Entropy

Appears in 8 sentences as: Maximum Entropy (5) maximum entropy (3)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. We chose not to add any weighting of features (by frequency or importance) and for the Maximum Entropy Model classifier we included binary data about whether single features occurred in the given context or not.
    Page 3, “Methods”
  2. 2.4 Maximum Entropy Classifier
    Page 4, “Methods”
  3. Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features).
    Page 4, “Methods”
  4. This shows that the Maximum Entropy Model in this situation could not learn any meaningful hypothesis from the cooc-curence of individually weak keywords.
    Page 5, “Results”
  5. Here we decided not to check whether these keywords made sense in scientific texts or not, but instead left this task to the maximum entropy classifier, and added only those keywords that were found reliable enough to predict spec label alone by the maxent model trained on the training dataset.
    Page 5, “Results”
  6. The majority of these phrases were found to be reliable enough for our maximum entropy model to predict a speculative class based on that single feature.
    Page 6, “Results”
  7. Since hedge cues cause systems to predict false positive labels, our idea here was to train Maximum Entropy Models for the false positive classifications of our ICD-9-CM coding system using the vector space representation of radiology reports.
    Page 6, “Results”
  8. Next, as the learnt maximum entropy models show, the hedge classification task reduces to a lookup for single keywords or phrases and to the evaluation of the text based on the most relevant cue alone.
    Page 7, “Conclusions”

See all papers in Proc. ACL 2008 that mention Maximum Entropy.

See all papers in Proc. ACL that mention Maximum Entropy.

Back to top.

unigram

Appears in 6 sentences as: unigram (5) unigrams (1)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate.
    Page 4, “Methods”
  2. About half of these were the kind of phrases that had no unigram components of themselves in the feature set, so these could be regarded as meaningful standalone features.
    Page 6, “Results”
  3. Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively).
    Page 6, “Results”
  4. Our experiments revealed that in radiology reports, which mainly concentrate on listing the identified diseases and symptoms (facts) and the physician’s impressions (speculative parts), detecting hedge instances can be performed accurately using unigram features.
    Page 7, “Results”
  5. All bi- and trigrams retained by our feature selection process had unigram equivalents that were eliminated due to the noise present in the automatically generated training data.
    Page 7, “Results”
  6. Our finding that token unigram features are capable of solving the task accurately agrees with the the results of previous works on hedge classification ((Light et al., 2004), (Med-
    Page 7, “Conclusions”

See all papers in Proc. ACL 2008 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

bigrams

Appears in 5 sentences as: bigram (2) bigrams (3)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. 0 The extension of the feature representation used by previous works with bigrams and trigrams and an evaluation of the benefit of using longer keywords in hedge classification.
    Page 2, “Introduction”
  2. For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate.
    Page 4, “Methods”
  3. These keywords were many times used in a mathematical context (referring to probabilities) and thus expressed no speculative meaning, while such uses were not represented in the FlyBase articles (otherwise bigram or trigram features could have captured these non-speculative uses).
    Page 6, “Results”
  4. One third of the features used by our advanced model were either bigrams or trigrams.
    Page 6, “Results”
  5. Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively).
    Page 6, “Results”

See all papers in Proc. ACL 2008 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

conditional probability

Appears in 3 sentences as: conditional probability (3)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. Being a stopword in our case, and having no relevance at all to speculative assertions, it has a class conditional probability of P(spec|it) = 74.67% on the seed sets.
    Page 3, “Methods”
  2. We ranked the features c by frequency and their class conditional probability P(spec|cc).
    Page 4, “Methods”
  3. Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features).
    Page 4, “Methods”

See all papers in Proc. ACL 2008 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

natural language

Appears in 3 sentences as: Natural Language (1) natural language (2)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. In various natural language processing tasks, relevant statements appearing in a speculative context are treated as false positives.
    Page 1, “Introduction”
  2. (Hyland, 1994)), speculative language from a Natural Language Processing perspective has only been studied in the past few years.
    Page 2, “Introduction”
  3. What makes this iterative method efficient is that, as we said earlier, hedging is expressed via keywords in natural language texts; and often several keywords are present in a single sentence.
    Page 3, “Methods”

See all papers in Proc. ACL 2008 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

vector space

Appears in 3 sentences as: vector space (3)
In Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords
  1. As regards the nature of this task, a vector space model (VSM) is a straightforward and suitable representation for statistical learning.
    Page 2, “Methods”
  2. Our experiments demonstrated that it is indeed a good idea to include longer phrases in the vector space model representation of sentences.
    Page 6, “Results”
  3. Since hedge cues cause systems to predict false positive labels, our idea here was to train Maximum Entropy Models for the false positive classifications of our ICD-9-CM coding system using the vector space representation of radiology reports.
    Page 6, “Results”

See all papers in Proc. ACL 2008 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.