Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

David Greenwood, Tom Taverner, Nicola J. Adderley, Malcolm Price, Krishna Gokhale, Chris Sainsbury, Suzy Gallier, Carly Welch, Elizabeth Sapey, Duncan Murray, Hilary Fanning, Simon Ball, Krishnarajah Nirantharakumar, Wayne Croft, Paul Moss

Research output: Contribution to journalArticlepeer-review


Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modelling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n= 6099) and two validation cohorts during the first and second waves of the pandemic (n=996; n=1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.

Original languageEnglish
Article number104480
Publication statusAccepted/In press - 20 May 2022

Bibliographical note

Final Version of Record not yet available as of 13/06/2022.

© 2022 The Author(s).


Dive into the research topics of 'Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential'. Together they form a unique fingerprint.

Cite this