Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

LITMUS Consortium Investigators; Matthew McTeer; Douglas Applegate; Peter Mesenbrink; Vlad Ratziu; Jörn M. Schattenberg; Elisabetta Bugianesi; Andreas Geier; Manuel Romero-Gomez; Jean-Francois Dufour; Mattias Ekstedt; Sven Francque; Hannele Yki-järvinen; Michael Allison; Luca Valenti; Luca Miele; Michael Pavlides; Jeremy Cobbold; Georgios Papatheodoridis; Adriaan G. Holleboom; Dina Tiniakos; Clifford Brass; Quentin M. Anstee; Paolo Missier

doi:10.1371/journal.pone.0299487

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

LITMUS Consortium Investigators, Matthew McTeer^*, Douglas Applegate, Peter Mesenbrink, Vlad Ratziu, Jörn M. Schattenberg, Elisabetta Bugianesi, Andreas Geier, Manuel Romero-Gomez, Jean-Francois Dufour, Mattias Ekstedt, Sven Francque, Hannele Yki-järvinen, Michael Allison, Luca Valenti, Luca Miele, Michael Pavlides, Jeremy Cobbold, Georgios Papatheodoridis, Adriaan G. HolleboomDina Tiniakos, Clifford Brass, Quentin M. Anstee, Paolo Missier

^*Corresponding author for this work

Computer Science

Research output: Contribution to journal › Article › peer-review

24 Downloads (Pure)

Abstract

Aims: Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints.

Methods: Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable.

Results: Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance.

Conclusions: This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.

Original language	English
Article number	e0299487
Number of pages	17
Journal	PLOS One
Volume	19
Issue number	2
DOIs	https://doi.org/10.1371/journal.pone.0299487
Publication status	Published - 29 Feb 2024

Bibliographical note

Funding:
This work was supported by Newcastle University and Red Hat UK. This work has been supported by the LITMUS project, which has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No. 777377. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. QMA is an NIHR Senior Investigator and is supported by the Newcastle NIHR Biomedical Research Centre. This communication reflects the view of the authors and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein.

Access to Document

10.1371/journal.pone.0299487Licence: Creative Commons: Attribution (CC BY)

McTeerM2024MachineFinal published version, 1.4 MBLicence: Creative Commons: Attribution (CC BY)

H2020_COLLAB (IMI)_LITMUS
Newsome, P.
European Commission
1/11/17 → 29/02/24
Project: EU

Cite this

LITMUS Consortium Investigators, McTeer, M., Applegate, D., Mesenbrink, P., Ratziu, V., Schattenberg, J. M., Bugianesi, E., Geier, A., Romero-Gomez, M., Dufour, J.-F., Ekstedt, M., Francque, S., Yki-järvinen, H., Allison, M., Valenti, L., Miele, L., Pavlides, M., Cobbold, J., Papatheodoridis, G., ... Missier, P. (2024). Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information. PLOS One, 19(2), Article e0299487. https://doi.org/10.1371/journal.pone.0299487

@article{fa3bd4135e5e439e808b1cadc6a8cf88,

title = "Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information",

abstract = "Aims: Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods: Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results: Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions: This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.",

author = "{LITMUS Consortium Investigators} and Matthew McTeer and Douglas Applegate and Peter Mesenbrink and Vlad Ratziu and Schattenberg, {J{\"o}rn M.} and Elisabetta Bugianesi and Andreas Geier and Manuel Romero-Gomez and Jean-Francois Dufour and Mattias Ekstedt and Sven Francque and Hannele Yki-j{\"a}rvinen and Michael Allison and Luca Valenti and Luca Miele and Michael Pavlides and Jeremy Cobbold and Georgios Papatheodoridis and Holleboom, {Adriaan G.} and Dina Tiniakos and Clifford Brass and Anstee, {Quentin M.} and Paolo Missier",

note = "Funding: This work was supported by Newcastle University and Red Hat UK. This work has been supported by the LITMUS project, which has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No. 777377. This Joint Undertaking receives support from the European Union{\textquoteright}s Horizon 2020 research and innovation programme and EFPIA. QMA is an NIHR Senior Investigator and is supported by the Newcastle NIHR Biomedical Research Centre. This communication reflects the view of the authors and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein.",

year = "2024",

month = feb,

day = "29",

doi = "10.1371/journal.pone.0299487",

language = "English",

volume = "19",

journal = "PLOS One",

issn = "1932-6203",

publisher = "Public Library of Science (PLOS)",

number = "2",

}

LITMUS Consortium Investigators, McTeer, M, Applegate, D, Mesenbrink, P, Ratziu, V, Schattenberg, JM, Bugianesi, E, Geier, A, Romero-Gomez, M, Dufour, J-F, Ekstedt, M, Francque, S, Yki-järvinen, H, Allison, M, Valenti, L, Miele, L, Pavlides, M, Cobbold, J, Papatheodoridis, G, Holleboom, AG, Tiniakos, D, Brass, C, Anstee, QM & Missier, P 2024, 'Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information', PLOS One, vol. 19, no. 2, e0299487. https://doi.org/10.1371/journal.pone.0299487

TY - JOUR

T1 - Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

AU - LITMUS Consortium Investigators

AU - McTeer, Matthew

AU - Applegate, Douglas

AU - Mesenbrink, Peter

AU - Ratziu, Vlad

AU - Schattenberg, Jörn M.

AU - Bugianesi, Elisabetta

AU - Geier, Andreas

AU - Romero-Gomez, Manuel

AU - Dufour, Jean-Francois

AU - Ekstedt, Mattias

AU - Francque, Sven

AU - Yki-järvinen, Hannele

AU - Allison, Michael

AU - Valenti, Luca

AU - Miele, Luca

AU - Pavlides, Michael

AU - Cobbold, Jeremy

AU - Papatheodoridis, Georgios

AU - Holleboom, Adriaan G.

AU - Tiniakos, Dina

AU - Brass, Clifford

AU - Anstee, Quentin M.

AU - Missier, Paolo

N1 - Funding: This work was supported by Newcastle University and Red Hat UK. This work has been supported by the LITMUS project, which has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No. 777377. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. QMA is an NIHR Senior Investigator and is supported by the Newcastle NIHR Biomedical Research Centre. This communication reflects the view of the authors and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein.

PY - 2024/2/29

Y1 - 2024/2/29

N2 - Aims: Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods: Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results: Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions: This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.

AB - Aims: Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods: Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results: Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions: This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.

U2 - 10.1371/journal.pone.0299487

DO - 10.1371/journal.pone.0299487

M3 - Article

SN - 1932-6203

VL - 19

JO - PLOS One

JF - PLOS One

IS - 2

M1 - e0299487

ER -

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Abstract

Bibliographical note

Access to Document

Fingerprint

Projects

H2020_COLLAB (IMI)_LITMUS

Cite this