Mapping gene-by-gene single-nucleotide variation in 8,535 Mycobacterium tuberculosis genomes: a resource to support potential vaccine and drug development

Research output: Contribution to journalArticlepeer-review


  • Danai Papakonstantinou
  • Simon J. Draper
  • Matthew K. O’Shea
  • Jacqueline M. Achkar (Editor)


Abstract Tuberculosis (TB) is responsible for millions of deaths annually. More effective vaccines and new antituberculous drugs are essential to control the disease. Numerous genomic studies have advanced our knowledge about M. tuberculosis drug resistance, population structure, and transmission patterns. At the same time, reverse vaccinology and drug discovery pipelines have identified potential immunogenic vaccine candidates or drug targets. However, a better understanding of the sequence variation of all the M. tuberculosis genes on a large scale could aid in the identification of new vaccine and drug targets. Achieving this was the focus of the current study. Genome sequence data were obtained from online public sources covering seven M. tuberculosis lineages. A total of 8,535 genome sequences were mapped against M. tuberculosis H37Rv reference genome, in order to identify single nucleotide polymorphisms (SNPs). The results of the initial mapping were further processed, and a frequency distribution of nucleotide variants within genes was identified and further analyzed. The majority of genomic positions in the M. tuberculosis H37Rv genome were conserved. Genes with the highest level of conservation were often associated with stress responses and maintenance of redox balance. Conversely, genes with high levels of nucleotide variation were often associated with drug resistance. We have provided a high-resolution analysis of the single-nucleotide variation of all M. tuberculosis genes across seven lineages as a resource to support future drug and vaccine development. We have identified a number of highly conserved genes, important in M. tuberculosis biology, that could potentially be used as targets for novel vaccine candidates and antituberculous medications. Importance Tuberculosis is an infectious disease caused by the bacterium Mycobacterium tuberculosis. In the first half of the 20th century, the discovery of the Mycobacterium bovis BCG vaccine and antituberculous drugs heralded a new era in the control of TB. However, combating TB has proven challenging, especially with the emergence of HIV and drug resistance. A major hindrance in TB control is the lack of an effective vaccine, as the efficacy of BCG is geographically variable and provides little protection against pulmonary disease in high-risk groups. Our research is significant because it provides a resource to support future drug and vaccine development. We have achieved this by developing a better understanding of the nucleotide variation of all of the M. tuberculosis genes on a large scale and by identifying highly conserved genes that could potentially be used as targets for novel vaccine candidates and antituberculous medications.

Bibliographic note

Funding Information: This project was funded by a Global Challenges Ph.D. studentship awarded to D.P. by the University of Birmingham. S.J.D. (Dunn) is funded by BBSRC grant no. BB/ R006261/1, awarded to A.M. S.J.D. (Draper) is a Wellcome Trust Senior Fellow (106917/ Z/15/Z). Publisher Copyright: Copyright © 2021 Papakonstantinou et al. This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.


Original languageEnglish
Article numbere01224-20
Number of pages15
Issue number2
Publication statusPublished - 10 Mar 2021


  • Mycobacterium tuberculosis, SNPs, TB, drug targets, single nucleotide polymorphisms, single-nucleotide variation, tuberculosis, vaccine candidates

ASJC Scopus subject areas

Sustainable Development Goals