PLM-ARG: antibiotic resistance gene identification using a pretrained protein language model

Jun Wu, Jian Ouyang, Haipeng Qin, Jiajia Zhou, Ruth Roberts, Rania Siam, Lan Wang, Weida Tong*, Zhichao Liu*, Tieliu Shi*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

38 Downloads (Pure)

Abstract

Motivation: Antibiotic resistance presents a formidable global challenge to public health and the environment. While considerable endeavors have been dedicated to identify antibiotic resistance genes (ARGs) for assessing the threat of antibiotic resistance, recent extensive investigations using metagenomic and metatranscriptomic approaches have unveiled a noteworthy concern. A significant fraction of proteins defies annotation through conventional sequence similarity-based methods, an issue that extends to ARGs, potentially leading to their under-recognition due to dissimilarities at the sequence level. Results: Herein, we proposed an Artificial Intelligence-powered ARG identification framework using a pretrained large protein language model, enabling ARG identification and resistance category classification simultaneously. The proposed PLM-ARG was developed based on the most comprehensive ARG and related resistance category information (>28K ARGs and associated 29 resistance categories), yielding Matthew’s correlation coefficients (MCCs) of 0.983 ± 0.001 by using a 5-fold cross-validation strategy. Furthermore, the PLM-ARG model was verified using an independent validation set and achieved an MCC of 0.838, outperforming other publicly available ARG prediction tools with an improvement range of 51.8%–107.9%. Moreover, the utility of the proposed PLM-ARG model was demonstrated by annotating resistance in the UniProt database and evaluating the impact of ARGs on the Earth's environmental microbiota. Availability and implementation: PLM-ARG is available for academic purposes at https://github.com/Junwu302/PLM-ARG, and a user-friendly webserver (http://www.unimd.org/PLM-ARG) is also provided.
Original languageEnglish
Article numberbtad690
Number of pages9
JournalBioinformatics
Volume39
Issue number11
Early online date23 Nov 2023
DOIs
Publication statusPublished - 25 Nov 2023

Bibliographical note

Funding
This work was supported by Shanghai Municipal Science and Technology Major Project [2017SHZDZX01, 20692191500]; the Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, ECNU, and the Key Laboratory of MEA, Ministry of Education, East China Normal University.

Fingerprint

Dive into the research topics of 'PLM-ARG: antibiotic resistance gene identification using a pretrained protein language model'. Together they form a unique fingerprint.

Cite this