Abstract
Objective: To illustrate how to evaluate the need of complex strategies for developing generalizable prediction models in large clustered datasets.
Study Design and Setting: We developed eight Cox regression models to estimate the risk of heart failure using a large population-level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models’ generalizability across the included general practices.
Results: Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke.
Conclusion: In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies.
Study Design and Setting: We developed eight Cox regression models to estimate the risk of heart failure using a large population-level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models’ generalizability across the included general practices.
Results: Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke.
Conclusion: In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies.
| Original language | English |
|---|---|
| Pages (from-to) | 83-91 |
| Number of pages | 9 |
| Journal | Journal of Clinical Epidemiology |
| Volume | 137 |
| Early online date | 6 Apr 2021 |
| DOIs | |
| Publication status | Published - Sept 2021 |
Keywords
- Prediction model
- Calibration
- Discrimination
- Validation
- Heterogeneity
- Model comparison
Fingerprint
Dive into the research topics of 'Internal-external cross-validation helped to evaluate the generalizability of prediction models in large clustered datasets'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver