Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

David Greenwood; Tom Taverner; Nicola J. Adderley; Malcolm Price; Krishna Gokhale; Chris Sainsbury; Suzy Gallier; Carly Welch; Elizabeth Sapey; Duncan Murray; Hilary Fanning; Simon Ball; Krishnarajah Nirantharakumar; Wayne Croft; Paul Moss

doi:10.1016/j.isci.2022.104480

Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

David Greenwood, Tom Taverner, Nicola J. Adderley, Malcolm Price, Krishna Gokhale, Chris Sainsbury, Suzy Gallier, Carly Welch, Elizabeth Sapey, Duncan Murray, Hilary Fanning, Simon Ball, Krishnarajah Nirantharakumar, Wayne Croft, Paul Moss

Research output: Contribution to journal › Article › peer-review

52 Downloads (Pure)

Abstract

Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modeling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n = 6099) and two validation cohorts during the first and second waves of the pandemic (n = 996; n = 1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates, which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.

Original language	English
Article number	104480
Journal	iScience
Volume	25
Issue number	7
Early online date	31 May 2022
DOIs	https://doi.org/10.1016/j.isci.2022.104480
Publication status	Published - 15 Jul 2022

Bibliographical note

Funding Information:
The work was funded from an NIHR grant to PM. The sponsor of the ethics had no role in decision to publish, collection of data or authorship. The contributions by NA, ES, KN, MP, CS and TT were funded by the Medical Research Council UK Research and Innovation (reference COV0306) during the study. The funder had no role in developing the research question or the study protocol.

Funding Information:
MJP was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham . The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Publisher Copyright:
© 2022 The Author(s)

Keywords

Bioinformatics
Viral microbiology
medical informatics

ASJC Scopus subject areas

General

Access to Document

10.1016/j.isci.2022.104480Licence: Creative Commons: Attribution (CC BY)

GreenwoodD2022MachineFinal published version, 5.11 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

Greenwood, D., Taverner, T., Adderley, N. J., Price, M., Gokhale, K., Sainsbury, C., Gallier, S., Welch, C., Sapey, E., Murray, D., Fanning, H., Ball, S., Nirantharakumar, K., Croft, W., & Moss, P. (2022). Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential. iScience, 25(7), Article 104480. Advance online publication. https://doi.org/10.1016/j.isci.2022.104480

@article{2b1d7c441b49415fb3976ba75340b074,

title = "Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential",

abstract = "Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modeling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n = 6099) and two validation cohorts during the first and second waves of the pandemic (n = 996; n = 1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates, which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.",

keywords = "Bioinformatics, Viral microbiology, medical informatics",

author = "David Greenwood and Tom Taverner and Adderley, {Nicola J.} and Malcolm Price and Krishna Gokhale and Chris Sainsbury and Suzy Gallier and Carly Welch and Elizabeth Sapey and Duncan Murray and Hilary Fanning and Simon Ball and Krishnarajah Nirantharakumar and Wayne Croft and Paul Moss",

note = "Funding Information: The work was funded from an NIHR grant to PM. The sponsor of the ethics had no role in decision to publish, collection of data or authorship. The contributions by NA, ES, KN, MP, CS and TT were funded by the Medical Research Council UK Research and Innovation (reference COV0306) during the study. The funder had no role in developing the research question or the study protocol. Funding Information: MJP was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham . The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. Publisher Copyright: {\textcopyright} 2022 The Author(s)",

year = "2022",

month = jul,

day = "15",

doi = "10.1016/j.isci.2022.104480",

language = "English",

volume = "25",

journal = "iScience",

issn = "2589-0042",

publisher = "Elsevier",

number = "7",

}

TY - JOUR

T1 - Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

AU - Greenwood, David

AU - Taverner, Tom

AU - Adderley, Nicola J.

AU - Price, Malcolm

AU - Gokhale, Krishna

AU - Sainsbury, Chris

AU - Gallier, Suzy

AU - Welch, Carly

AU - Sapey, Elizabeth

AU - Murray, Duncan

AU - Fanning, Hilary

AU - Ball, Simon

AU - Nirantharakumar, Krishnarajah

AU - Croft, Wayne

AU - Moss, Paul

N1 - Funding Information: The work was funded from an NIHR grant to PM. The sponsor of the ethics had no role in decision to publish, collection of data or authorship. The contributions by NA, ES, KN, MP, CS and TT were funded by the Medical Research Council UK Research and Innovation (reference COV0306) during the study. The funder had no role in developing the research question or the study protocol. Funding Information: MJP was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham . The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. Publisher Copyright: © 2022 The Author(s)

PY - 2022/7/15

Y1 - 2022/7/15

N2 - Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modeling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n = 6099) and two validation cohorts during the first and second waves of the pandemic (n = 996; n = 1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates, which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.

AB - Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modeling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n = 6099) and two validation cohorts during the first and second waves of the pandemic (n = 996; n = 1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates, which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.

KW - Bioinformatics

KW - Viral microbiology

KW - medical informatics

UR - http://www.scopus.com/inward/record.url?scp=85132215022&partnerID=8YFLogxK

U2 - 10.1016/j.isci.2022.104480

DO - 10.1016/j.isci.2022.104480

M3 - Article

C2 - 35665240

SN - 2589-0042

VL - 25

JO - iScience

JF - iScience

IS - 7

M1 - 104480

ER -

Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this