-Omics biomarker identification pipeline for translational medicine
Research output: Contribution to journal › Article
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, B15 2TT, Birmingham, United Kingdom.
- Mammalian Genetics Unit, Medical Research Council Harwell Institute, Harwell Campus, Didcot, OX11 0RD, UK.
- MRC Health Data Research UK (HDR UK), London, UK.
- NIHR Experimental Cancer Medicine Centre, B15 2TT, Birmingham, UK.
- NIHR Surgical Reconstruction and Microbiology Research Centre, B15 2TT, Birmingham, UK.
- Centre for Liver Research, NIHR Birmingham Liver Biomedical Research Unit, University of Birmingham, Birmingham B15 2TT, UK.
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, B15 2TT, UK. email@example.com.
- NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, B15 2TT, UK. firstname.lastname@example.org.
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom.
BACKGROUND: Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary.
METHODS: We developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers.
RESULTS: We applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Mills et al. (Blood 114:1063-1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results.
CONCLUSIONS: We developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings.
|Number of pages||10|
|Journal||Journal of translational medicine|
|Publication status||Published - 14 May 2019|
- Biomarker, -Omics, Regularization, Feature selection, Translational medicine