A Study of Concept-based Weighting Regularization for Medical Records Search
Wang, Yue and Liu, Xitong and Fang, Hui

Article Structure


An important search task in the biomedical domain is to find medical records of patients who are qualified for a clinical trial.


With the increasing use of electronic health records, it becomes urgent to leverage this rich information resource about patients’ health conditions to transform research in health and medicine.

Related Work

The Medical Records track of the Text REtrieval Conference (TREC) provides a common platform to study the medical records retrieval problem and evaluate the proposed methods (Voorhees and Tong, 2011; Voorhees and Hersh, 2012).

Concept-based Representation for Medical Records Retrieval

3.1 Problem Formulation

Weighting Strategies for Concept-based Representation

4.1 Motivation


3.1 Experiment Setup

Conclusions and Future Work

Medical record retrieval is an important domain-specific IR problem.


confidence score

Appears in 8 sentences as: confidence score (7) confidence scores (1)
In A Study of Concept-based Weighting Regularization for Medical Records Search
  1. In particular, MetaMap (Aronson, 2001) can take a text string as the input, segment it into phrases, and then map each phrase to multiple UMLS CUIs with confidence scores .
    Page 3, “Concept-based Representation for Medical Records Retrieval”
  2. The confidence score is an indicator of the quality of the phrase-to-concept mapping by MetaMap.
    Page 3, “Concept-based Representation for Medical Records Retrieval”
  3. confidence score as well as more detailed information about this concept.
    Page 3, “Concept-based Representation for Medical Records Retrieval”
  4. Although MetaMap is able to rank all the candidate concepts with the confidence score and pick the most likely one, the accuracy is not very high.
    Page 4, “Weighting Strategies for Concept-based Representation”
  5. i(e) is the normalized confidence score of the mapping for concept 6 generated by MetaMap.
    Page 4, “Weighting Strategies for Concept-based Representation”
  6. Since each concept mapping is associated with a confidence score , we can incorporate them into the regularization function as follows:
    Page 5, “Weighting Strategies for Concept-based Representation”
  7. where i(e) is the normalized confidence score of concept 6 generated by MetaMap, and 04 is a parameter between 0 and l to control the effect of the regularization.
    Page 5, “Weighting Strategies for Concept-based Representation”
  8. As shown in Equation (3), the Balanced method regularizes the weights through two components: (1) normalized confidence score of each aspect,
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention confidence score.

See all papers in Proc. ACL that mention confidence score.

Back to top.

knowledge bases

Appears in 5 sentences as: knowledge bases (6)
In A Study of Concept-based Weighting Regularization for Medical Records Search
  1. In the past decades, significant efforts have been put on constructing biomedical knowledge bases (Aronson and Lang, 2010; Lipscomb, 2000; Corporation, 1999) and developing natural language processing (NLP) tools, such as MetaMap, to utilize the information from the knowledge bases (Aronson, 2001; McInnes et al., 2009).
    Page 1, “Introduction”
  2. Indeed, concept-based representation is one of the commonly used approaches that leverage knowledge bases to improve the retrieval performance (Limsopatham et al., 2013d; Limsopatham et al., 2013b).
    Page 1, “Introduction”
  3. The basic idea is to represent both queries and documents as “bags of concepts”, where the concepts are identified based on the information from the knowledge bases .
    Page 1, “Introduction”
  4. In particular, MetaMap is used to map terms from queries and documents (e.g., medical records) to the semantic concepts from biomedical knowledge bases such as UMLS.
    Page 3, “Concept-based Representation for Medical Records Retrieval”
  5. Second, we will study how to leverage other information from knowledge bases to further improve the performance.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention knowledge bases.

See all papers in Proc. ACL that mention knowledge bases.

Back to top.

vector space

Appears in 3 sentences as: vector space (3)
In A Study of Concept-based Weighting Regularization for Medical Records Search
  1. For example, Qi and Laquerre used MetaMap to generate the concept-based representation and then apply a vector space retrieval model for ranking, and their results are one of the top ranked runs in the TREC 2012 Medical Records track (Qi and Laquerre, 2012).
    Page 2, “Related Work”
  2. However, existing studies on concept-based representation still used weighting strategies developed for term-based representation such as vector space models (Qi and Laquerre, 2012) and divergence from randomness (DFR) (Limsopatham et al., 2013a) and did not take the inaccurate concept mapping results into consideration.
    Page 2, “Related Work”
  3. After converting both queries and documents to concept-based representations using MetaMap, previous work applied existing retrieval functions such as vector space models (Singhal et al., 1996) to rank the documents.
    Page 3, “Concept-based Representation for Medical Records Retrieval”

See all papers in Proc. ACL 2014 that mention vector space.

See all papers in Proc. ACL that mention vector space.

Back to top.