Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

Furqan Aziz; Animesh Acharjee; John Williams; Dominic Russ; Laura Bravo-Merodio; Georgios Gkoutos

doi:10.3390/ijms21217886

Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

Furqan Aziz, Animesh Acharjee, John Williams, Dominic Russ, Laura Bravo-Merodio, Georgios Gkoutos

Cancer and Genomic Sciences

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

150 Downloads (Pure)

Abstract

Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.

Original language	English
Article number	7886
Pages (from-to)	1-22
Number of pages	22
Journal	International Journal of Molecular Sciences
Volume	21
Issue number	21
DOIs	https://doi.org/10.3390/ijms21217886
Publication status	Published - 23 Oct 2020

Keywords

Causal modelling
Experimental design
Gene regulatory network
Omics integration

ASJC Scopus subject areas

Catalysis
Molecular Biology
Spectroscopy
Computer Science Applications
Physical and Theoretical Chemistry
Organic Chemistry
Inorganic Chemistry

Access to Document

10.3390/ijms21217886Licence: Creative Commons: Attribution (CC BY)

Aziz_et_al_2020_Biomarker_prioritisation_and_power_estimation_International_Journal_of_Molecular_SciencesFinal published version, 16.3 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{d8208be7bd7940028849f7ceeb1307de,

title = "Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference",

abstract = "Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.",

keywords = "Causal modelling, Experimental design, Gene regulatory network, Omics integration",

author = "Furqan Aziz and Animesh Acharjee and John Williams and Dominic Russ and Laura Bravo-Merodio and Georgios Gkoutos",

year = "2020",

month = oct,

day = "23",

doi = "10.3390/ijms21217886",

language = "English",

volume = "21",

pages = "1--22",

journal = "International Journal of Molecular Sciences",

issn = "1661-6596",

publisher = "MDPI",

number = "21",

}

TY - JOUR

T1 - Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

AU - Aziz, Furqan

AU - Acharjee, Animesh

AU - Williams, John

AU - Russ, Dominic

AU - Bravo-Merodio, Laura

AU - Gkoutos, Georgios

PY - 2020/10/23

Y1 - 2020/10/23

N2 - Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.

AB - Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.

KW - Causal modelling

KW - Experimental design

KW - Gene regulatory network

KW - Omics integration

UR - http://www.scopus.com/inward/record.url?scp=85093949288&partnerID=8YFLogxK

U2 - 10.3390/ijms21217886

DO - 10.3390/ijms21217886

M3 - Article

C2 - 33114263

SN - 1661-6596

VL - 21

SP - 1

EP - 22

JO - International Journal of Molecular Sciences

JF - International Journal of Molecular Sciences

IS - 21

M1 - 7886

ER -

Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this