Open science practices need substantial improvement in prognostic model studies in oncology using machine learning

Gary S. Collins; Rebecca Whittle; Garrett S. Bullock; Patricia Logullo; Paula Dhiman; Jennifer A. de Beyer; Richard D. Riley; Michael M. Schlussel

doi:10.1016/j.jclinepi.2023.10.015

Open science practices need substantial improvement in prognostic model studies in oncology using machine learning

Gary S. Collins^*, Rebecca Whittle, Garrett S. Bullock, Patricia Logullo, Paula Dhiman, Jennifer A. de Beyer, Richard D. Riley, Michael M. Schlussel

^*Corresponding author for this work

Applied Health Research

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine-learning methods in the field of oncology.

Study design and setting: We conducted a systematic review, searching the MEDLINE database between 01/12/2022 and 31/12/2022 for studies developing a multivariable prognostic model using machine-learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices.

Results: We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data-sharing statements, with 21 (46%) indicating data were available on request to the authors and 7 declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code-sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their to model to be used in practice. The use of reporting guidelines was rare: 8 studies (18%) mentioning using a reporting guideline, with 4 (10%) using the TRIPOD statement, 1 (2%) using MI-CLAIM and CONSORT-AI, 1 (2%) using STROBE, 1 (2%) using STARD, and 1 (2%) using TREND.

Conclusion: The adoption of open science principles in oncology studies developing prognostic models using machine-learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science is needed for prediction research in oncology.

Original language	English
Journal	Journal of Clinical Epidemiology
Early online date	28 Oct 2023
DOIs	https://doi.org/10.1016/j.jclinepi.2023.10.015
Publication status	E-pub ahead of print - 28 Oct 2023

Bibliographical note

Funding statement:
GSC, PL, JdB, RW, and MMS are supported by Cancer Research UK (programme grant: C49297/A27294). GSC and RDR are supported by an MRC-NIHR Better Methods Better Research grant (reference: MR/V038168/1). RDR was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. PD was supported by Cancer Research UK (project grant: PRCPJT-Nov21\100021). The views expressed are those of the author(s) and not necessarily those of the Cancer Research UK, the NHS, the NIHR or the Department of Health and Social Care.

Keywords

Open science
Prognosis
Machine learning
Reporting
Data sharing
Code sharing

Access to Document

10.1016/j.jclinepi.2023.10.015Licence: Creative Commons: Attribution (CC BY)

Cite this

@article{b02e5085be5948a0b0ab196470c307b6,

title = "Open science practices need substantial improvement in prognostic model studies in oncology using machine learning",

abstract = "Objective: To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine-learning methods in the field of oncology.Study design and setting: We conducted a systematic review, searching the MEDLINE database between 01/12/2022 and 31/12/2022 for studies developing a multivariable prognostic model using machine-learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices.Results: We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data-sharing statements, with 21 (46%) indicating data were available on request to the authors and 7 declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code-sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their to model to be used in practice. The use of reporting guidelines was rare: 8 studies (18%) mentioning using a reporting guideline, with 4 (10%) using the TRIPOD statement, 1 (2%) using MI-CLAIM and CONSORT-AI, 1 (2%) using STROBE, 1 (2%) using STARD, and 1 (2%) using TREND.Conclusion: The adoption of open science principles in oncology studies developing prognostic models using machine-learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science is needed for prediction research in oncology.",

keywords = "Open science, Prognosis, Machine learning, Reporting, Data sharing, Code sharing",

author = "Collins, {Gary S.} and Rebecca Whittle and Bullock, {Garrett S.} and Patricia Logullo and Paula Dhiman and {de Beyer}, {Jennifer A.} and Riley, {Richard D.} and Schlussel, {Michael M.}",

note = "Funding statement: GSC, PL, JdB, RW, and MMS are supported by Cancer Research UK (programme grant: C49297/A27294). GSC and RDR are supported by an MRC-NIHR Better Methods Better Research grant (reference: MR/V038168/1). RDR was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. PD was supported by Cancer Research UK (project grant: PRCPJT-Nov21\100021). The views expressed are those of the author(s) and not necessarily those of the Cancer Research UK, the NHS, the NIHR or the Department of Health and Social Care.",

year = "2023",

month = oct,

day = "28",

doi = "10.1016/j.jclinepi.2023.10.015",

language = "English",

journal = "Journal of Clinical Epidemiology",

issn = "0895-4356",

publisher = "Elsevier",

}

TY - JOUR

T1 - Open science practices need substantial improvement in prognostic model studies in oncology using machine learning

AU - Collins, Gary S.

AU - Whittle, Rebecca

AU - Bullock, Garrett S.

AU - Logullo, Patricia

AU - Dhiman, Paula

AU - de Beyer, Jennifer A.

AU - Riley, Richard D.

AU - Schlussel, Michael M.

N1 - Funding statement: GSC, PL, JdB, RW, and MMS are supported by Cancer Research UK (programme grant: C49297/A27294). GSC and RDR are supported by an MRC-NIHR Better Methods Better Research grant (reference: MR/V038168/1). RDR was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. PD was supported by Cancer Research UK (project grant: PRCPJT-Nov21\100021). The views expressed are those of the author(s) and not necessarily those of the Cancer Research UK, the NHS, the NIHR or the Department of Health and Social Care.

PY - 2023/10/28

Y1 - 2023/10/28

N2 - Objective: To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine-learning methods in the field of oncology.Study design and setting: We conducted a systematic review, searching the MEDLINE database between 01/12/2022 and 31/12/2022 for studies developing a multivariable prognostic model using machine-learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices.Results: We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data-sharing statements, with 21 (46%) indicating data were available on request to the authors and 7 declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code-sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their to model to be used in practice. The use of reporting guidelines was rare: 8 studies (18%) mentioning using a reporting guideline, with 4 (10%) using the TRIPOD statement, 1 (2%) using MI-CLAIM and CONSORT-AI, 1 (2%) using STROBE, 1 (2%) using STARD, and 1 (2%) using TREND.Conclusion: The adoption of open science principles in oncology studies developing prognostic models using machine-learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science is needed for prediction research in oncology.

AB - Objective: To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine-learning methods in the field of oncology.Study design and setting: We conducted a systematic review, searching the MEDLINE database between 01/12/2022 and 31/12/2022 for studies developing a multivariable prognostic model using machine-learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices.Results: We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data-sharing statements, with 21 (46%) indicating data were available on request to the authors and 7 declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code-sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their to model to be used in practice. The use of reporting guidelines was rare: 8 studies (18%) mentioning using a reporting guideline, with 4 (10%) using the TRIPOD statement, 1 (2%) using MI-CLAIM and CONSORT-AI, 1 (2%) using STROBE, 1 (2%) using STARD, and 1 (2%) using TREND.Conclusion: The adoption of open science principles in oncology studies developing prognostic models using machine-learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science is needed for prediction research in oncology.

KW - Open science

KW - Prognosis

KW - Machine learning

KW - Reporting

KW - Data sharing

KW - Code sharing

U2 - 10.1016/j.jclinepi.2023.10.015

DO - 10.1016/j.jclinepi.2023.10.015

M3 - Article

SN - 0895-4356

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

ER -

Open science practices need substantial improvement in prognostic model studies in oncology using machine learning

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this