Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

David Greenwood, Tom Taverner, Nicola J. Adderley, Malcolm Price, Krishna Gokhale, Chris Sainsbury, Suzy Gallier, Carly Welch, Elizabeth Sapey, Duncan Murray, Hilary Fanning, Simon Ball, Krishnarajah Nirantharakumar, Wayne Croft, Paul Moss

Research output: Contribution to journalArticlepeer-review

44 Downloads (Pure)


Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modeling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n = 6099) and two validation cohorts during the first and second waves of the pandemic (n = 996; n = 1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates, which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.

Original languageEnglish
Article number104480
Issue number7
Early online date31 May 2022
Publication statusPublished - 15 Jul 2022

Bibliographical note

Funding Information:
The work was funded from an NIHR grant to PM. The sponsor of the ethics had no role in decision to publish, collection of data or authorship. The contributions by NA, ES, KN, MP, CS and TT were funded by the Medical Research Council UK Research and Innovation (reference COV0306) during the study. The funder had no role in developing the research question or the study protocol.

Funding Information:
MJP was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham . The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Publisher Copyright:
© 2022 The Author(s)


  • Bioinformatics
  • Viral microbiology
  • medical informatics

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential'. Together they form a unique fingerprint.

Cite this