Article Structure
Abstract
The automatic coding of clinical documents is an important task for today’s healthcare providers.
Introduction
The clinical domain presents a number of interesting challenges for natural language processing.
Document Coding and Lexical Triggers
Document coding is a special case of multi-class multi-label text classification.
Related work
Automated clinical coding has received much attention in the medical informatics literature.
Method
To address the task of document coding, our lexically-triggered HMM operates using a two-stage procedure:
Experiments
5.1 Data
Conclusion
We have presented the lexically-triggered HMM, a novel and effective approach for clinical document coding.
Topics
gold-standard
Appears in 7 sentences as: gold-standard (8)
In Lexically-Triggered Hidden Markov Models for Clinical Document Coding
- Labelling: Each candidate code is assigned a binary label (present or absent) based on whether it appears in the gold-standard code set.
Page 4, “Method”
- process can not introduce gold-standard codes that were not proposed by the dictionary.
Page 5, “Method”
- The gold-standard code set for the document is used to infer a gold-standard label sequence for these codes (top right).
Page 5, “Method”
- our dictionary in the training set.3 This allows the system to learn sequences of codes that are (and are not) likely to occur in the gold-standard data.
Page 5, “Method”
- Let S be a set of training points (cc, 3/), where cc is the input and y is the corresponding gold-standard tag sequence.
Page 6, “Method”
- The search step can be carried out with a two-best version of the Viterbi algorithm; if the one-best answer y’1 matches the gold-standard 3/, that is 6(y, = 0, then 3/2 is checked to see if its loss is higher.
Page 7, “Method”
- For each low-frequency code c, we hold out all training documents that include 0 in their gold-standard code set.
Page 9, “Experiments”
See all papers in Proc. ACL 2011 that mention gold-standard.
See all papers in Proc. ACL that mention gold-standard.
Back to top.
n-grams
Appears in 5 sentences as: n-grams (5)
In Lexically-Triggered Hidden Markov Models for Clinical Document Coding
- Statistical systems trained on only text-derived features (such as n-grams ) did not show good performance due to a wide variety of medical language and a relatively small training set (Goldstein et al., 2007).
Page 3, “Related work”
- The transition features are modeled as simple indicators over n-grams of present codes, for values of n up to 10, the largest number of codes proposed by
Page 5, “Method”
- We found it useful to pad our n-grams with “beginning of document” tokens for sequences when fewer than n codes have been labelled as present, but found it harmful to include an end-of-document tag once labelling is complete.
Page 5, “Method”
- Note that n-grams features are only code-specific, as they are not connected to any specific trigger term.
Page 6, “Method”
- The classifiers use a feature set designed to mimic our LT-HMM as closely as possible, including n-grams , dictionary matches, ConText output, and symptom/disease se-
Page 7, “Experiments”
See all papers in Proc. ACL 2011 that mention n-grams.
See all papers in Proc. ACL that mention n-grams.
Back to top.
rule-based
Appears in 4 sentences as: rule-based (4)
In Lexically-Triggered Hidden Markov Models for Clinical Document Coding
- Several teams, including the winner, built pure symbolic (i.e., handcrafted rule-based ) systems (e.g., (Goldstein et al., 2007)).
Page 3, “Related work”
- Our second baseline is a symbolic system, designed to evaluate the quality of our rule-based components when used alone.
Page 8, “Experiments”
- Table 3: Micro-averaged Fl-scores for statistical and symbolic baselines, the proposed LT-HMM approach, and the best CMC handcrafted rule-based system.
Page 8, “Experiments”
- Also, it achieves the best ever performance on a common testbed, beating the top-performer of the 2007 CMC Challenge, a handcrafted rule-based system.
Page 9, “Conclusion”
See all papers in Proc. ACL 2011 that mention rule-based.
See all papers in Proc. ACL that mention rule-based.
Back to top.