Clinical concept annotation with contextual word embedding in active transfer learning environment

Asim Abbas*, Mark Lee, Niloofer Shanavas, Venelin Kovatchev

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

Objective: The study aims to present an active learning approach that automatically extracts clinical concepts from unstructured data and classifies them into explicit categories such as Problem, Treatment, and Test while preserving high precision and recall and demonstrating the approach through experiments using i2b2 public datasets.

Methods: Initially labeled data is acquired from a lexical-based approach in sufficient amounts to perform an active learning process. A contextual word embedding similarity approach is adopted using BERT base variant models such as ClinicalBERT, DistilBERT, and SCIBERT to automatically classify the unlabeled clinical concept into explicit categories. Additionally, deep learning and Large Language Model (LLM’s) are trained on acquiring label data through active learning.

Results: In this study we have utilized i2b2 public datasets consisting of 426 clinical notes. Employing a lexical-based approach, the methodology achieves notable precision, recall, and F1-scores of 76%, 70%, and 73\%, respectively. Transitioning into an active transfer learning environment, SCIBERT outperformed, exhibiting precision, recall, F1- score, and accuracy of 70.84%, 77.40%, 73.97%, and 69.30\%, surpassing counterpart models. In evaluating deep learning models, CNN achieved superior training accuracies of 92.04%, 92.40%, 95.31%, and 94.36%, and testing accuracies of 90.48%, 89.15%, 92.67%, and 92.42% when trained on BERTBase, DistilBERT, SCIBERT, and ClinicalBERT contextual embeddings, respectively. These results were higher compared to other deep learning models. Additionally, we individually evaluated these LLMs, among them, ClinicalBERT achieved the highest performance, with a training accuracy of 98.4% and a testing accuracy of 96%, outperforming the others.

Conclusion: The proposed methodology presents a promising contribution to clinical practice by streamlining concept extraction and annotation processes by introducing an active learning approach, offering efficiency without compromising accuracy. Using models like SCIBERT and CNN, the methodology significantly improves annotation efficiency with high accuracy, demonstrating its potential for enhancing clinical practice.
Original languageEnglish
Pages (from-to)1-31
Number of pages31
JournalDigital Health
Volume10
DOIs
Publication statusPublished - 19 Dec 2024

Keywords

  • clinical concept extraction
  • clinical concept annotation
  • contextual word embedding
  • 10 active transfer learning
  • large language models
  • deep learning
  • machine learning

Fingerprint

Dive into the research topics of 'Clinical concept annotation with contextual word embedding in active transfer learning environment'. Together they form a unique fingerprint.

Cite this