Lexically-Triggered Hidden Markov Models for Clinical Document Coding
Kiritchenko, Svetlana and Cherry, Colin

Article Structure

Abstract

The automatic coding of clinical documents is an important task for today’s healthcare providers.

Introduction

The clinical domain presents a number of interesting challenges for natural language processing.

Document Coding and Lexical Triggers

Document coding is a special case of multi-class multi-label text classification.

Related work

Automated clinical coding has received much attention in the medical informatics literature.

Method

To address the task of document coding, our lexically-triggered HMM operates using a two-stage procedure:

Experiments

5.1 Data

Conclusion

We have presented the lexically-triggered HMM, a novel and effective approach for clinical document coding.

Topics

gold-standard

Appears in 7 sentences as: gold-standard (8)
In Lexically-Triggered Hidden Markov Models for Clinical Document Coding
  1. Labelling: Each candidate code is assigned a binary label (present or absent) based on whether it appears in the gold-standard code set.
    Page 4, “Method”
  2. process can not introduce gold-standard codes that were not proposed by the dictionary.
    Page 5, “Method”
  3. The gold-standard code set for the document is used to infer a gold-standard label sequence for these codes (top right).
    Page 5, “Method”
  4. our dictionary in the training set.3 This allows the system to learn sequences of codes that are (and are not) likely to occur in the gold-standard data.
    Page 5, “Method”
  5. Let S be a set of training points (cc, 3/), where cc is the input and y is the corresponding gold-standard tag sequence.
    Page 6, “Method”
  6. The search step can be carried out with a two-best version of the Viterbi algorithm; if the one-best answer y’1 matches the gold-standard 3/, that is 6(y, = 0, then 3/2 is checked to see if its loss is higher.
    Page 7, “Method”
  7. For each low-frequency code c, we hold out all training documents that include 0 in their gold-standard code set.
    Page 9, “Experiments”

See all papers in Proc. ACL 2011 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

n-grams

Appears in 5 sentences as: n-grams (5)
In Lexically-Triggered Hidden Markov Models for Clinical Document Coding
  1. Statistical systems trained on only text-derived features (such as n-grams ) did not show good performance due to a wide variety of medical language and a relatively small training set (Goldstein et al., 2007).
    Page 3, “Related work”
  2. The transition features are modeled as simple indicators over n-grams of present codes, for values of n up to 10, the largest number of codes proposed by
    Page 5, “Method”
  3. We found it useful to pad our n-grams with “beginning of document” tokens for sequences when fewer than n codes have been labelled as present, but found it harmful to include an end-of-document tag once labelling is complete.
    Page 5, “Method”
  4. Note that n-grams features are only code-specific, as they are not connected to any specific trigger term.
    Page 6, “Method”
  5. The classifiers use a feature set designed to mimic our LT-HMM as closely as possible, including n-grams , dictionary matches, ConText output, and symptom/disease se-
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

rule-based

Appears in 4 sentences as: rule-based (4)
In Lexically-Triggered Hidden Markov Models for Clinical Document Coding
  1. Several teams, including the winner, built pure symbolic (i.e., handcrafted rule-based ) systems (e.g., (Goldstein et al., 2007)).
    Page 3, “Related work”
  2. Our second baseline is a symbolic system, designed to evaluate the quality of our rule-based components when used alone.
    Page 8, “Experiments”
  3. Table 3: Micro-averaged Fl-scores for statistical and symbolic baselines, the proposed LT-HMM approach, and the best CMC handcrafted rule-based system.
    Page 8, “Experiments”
  4. Also, it achieves the best ever performance on a common testbed, beating the top-performer of the 2007 CMC Challenge, a handcrafted rule-based system.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention rule-based.

See all papers in Proc. ACL that mention rule-based.

Back to top.