Skip to main navigation Skip to search Skip to main content

Machine learning-enabled systematic review on coded healthcare data in heart failure research

  • Asgher Champsi
  • , Karin T. Slater
  • , Simrat Gill
  • , Tomasz Dyszynski
  • , Megan Schröder
  • , Kiliana Suzart-Woischnik
  • , Benoît Tyl
  • , Guillaume Allee
  • , Alfonso Sartorius
  • , R Thomas Lumbers
  • , Folkert W Asselbergs
  • , Diederick E Grobbee
  • , Georgios Gkoutos
  • , Dipak Kotecha*
  • *Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

3 Downloads (Pure)

Abstract

Aims: Coded healthcare data are now commonly used in clinical research. This study aimed to assess the transparency of reporting within heart failure studies and employ machine learning to facilitate larger-scale evaluation.

Methods & Results: A systematic search of EMBASE and MEDLINE (2015-2020) identified 4,279 heart failure studies with accessible Extensible Markup Language published in the top 25 journals by impact factor. Manual extraction in a random sample of 170 studies by independent human reviewers characterised 40 studies (23.5%) that used coded healthcare data, with 34 of these (85%) reporting doing so and only 19 (47.5%) providing clear descriptions of dataset construction and linkage. Another 420 studies underwent manual annotation to further train a Natural Language Processing (NLP) model designed for this study to automate and upscale review. The NLP model processed 3,689 studies with a high level of internal accuracy (area under the receiver operating characteristic curve 0.97 and F1 score 0.96). Overall, the NLP approach identified 782 studies (21.2%) that reported coded healthcare data usage (95% CI 19.8%-20.9%). No correlation was found between the reporting of coded healthcare data use and the publication year (r=-0.05; p=0.21) or citation count (r=-0.13; p=0.12).

Conclusions: One-fifth of contemporary heart failure research articles are already reporting the use of coded healthcare data, with at-scale evaluation facilitated by a machine-learning model. The limited transparency on how coded healthcare data were used in studies highlights the need for quality standards such as the CODE-EHR framework for the use of healthcare data in research.
Original languageEnglish
Article numberztaf123
JournalEuropean Heart Journal
Early online date23 Oct 2025
DOIs
Publication statusE-pub ahead of print - 23 Oct 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Heart failure
  • coding
  • research
  • transparency
  • methodology

Fingerprint

Dive into the research topics of 'Machine learning-enabled systematic review on coded healthcare data in heart failure research'. Together they form a unique fingerprint.

Cite this