Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

Danesh Moradigaravand; Martin Palm; Anne Farewell; Ville Mustonen; Jonas Warringer; Leopold Parts

doi:10.1371/journal.pcbi.1006258

Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

Danesh Moradigaravand, Martin Palm, Anne Farewell, Ville Mustonen, Jonas Warringer, Leopold Parts

Cancer and Genomic Sciences

Research output: Contribution to journal › Article › peer-review

40 Citations (Scopus)

158 Downloads (Pure)

Abstract

The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81–0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

Original language	English
Article number	e1006258
Number of pages	17
Journal	PLoS Computational Biology
Volume	14
Issue number	12
DOIs	https://doi.org/10.1371/journal.pcbi.1006258
Publication status	Published - 14 Dec 2018

Access to Document

10.1371/journal.pcbi.1006258Licence: Creative Commons: Attribution (CC BY)

Moradigaravand_et_al_Prediction_of_antibiotic_resistance_in_Escherichia_coli_from_large-scale_pan-genome_data_PLoS_Computational_Biology_2018
Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J, Parts L (2018) Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput Biol 14(12): e1006258. https://doi.org/10.1371/journal.pcbi.1006258
Final published version, 1.57 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{2f0fce7325c4402997a80fb9746f453c,

title = "Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data",

abstract = "The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81–0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.",

author = "Danesh Moradigaravand and Martin Palm and Anne Farewell and Ville Mustonen and Jonas Warringer and Leopold Parts",

year = "2018",

month = dec,

day = "14",

doi = "10.1371/journal.pcbi.1006258",

language = "English",

volume = "14",

journal = "PLoS Computational Biology",

issn = "1553-734X",

publisher = "Public Library of Science (PLOS)",

number = "12",

}

TY - JOUR

T1 - Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

AU - Moradigaravand, Danesh

AU - Palm, Martin

AU - Farewell, Anne

AU - Mustonen, Ville

AU - Warringer, Jonas

AU - Parts, Leopold

PY - 2018/12/14

Y1 - 2018/12/14

N2 - The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81–0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

AB - The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81–0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

U2 - 10.1371/journal.pcbi.1006258

DO - 10.1371/journal.pcbi.1006258

M3 - Article

SN - 1553-734X

VL - 14

JO - PLoS Computational Biology

JF - PLoS Computational Biology

IS - 12

M1 - e1006258

ER -

Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

Abstract

Access to Document

Fingerprint

Cite this