MI-Pack: Increased confidence of metabolite identification in mass spectra by integrating accurate masses and metabolic pathways

Research output: Contribution to journalArticle

72 Citations (Scopus)


Metabolite identification is of central importance to metabolomics as it provides the route to new knowledge. Automated identification of the thousands of peaks detected by high resolution mass spectrometry is currently not possible, largely due to the finite mass accuracy of the spectrometer and the complexity that one peak can be assigned to one or more empirical formula(e) and each formula maps to one or more metabolites. Biological samples are not, however, composed of random metabolite mixtures, but instead comprise of thousands of compounds related through specific chemical transformations. Here we evaluate if prior biological knowledge of these transformations can improve metabolite identification accuracy. Our identification algorithm - which uses metabolite interconnectivity from the KEGG database to putatively identify metabolites by name - is based on mapping an experimentally-derived empirical formula difference for a pair of peaks to a known empirical formula difference between substrate-product pairs derived from KEGG, termed transformation mapping (TM). To maximize identification accuracy, we also developed a novel semi-automated method to calculate a mass error surface associated with experimental peak-pair differences. The TM algorithm with mass error surface has been extensively validated using simulated and experimental datasets by calculating false positive and false negative rates of metabolite identification. Compared to the traditional identification method of database searching accurate masses on a single-peak-by-peak basis, the TM algorithm reduces the false positive rate of identification by >4-fold, while maintaining a minimal false negative rate. The mass error surface, putative identification of metabolite names, and calculation of false positive and false negative rates collectively advance and improve upon related previous research on this topic [1, 2]. We conclude that inclusion of prior biological knowledge in the form of metabolic pathways provides one route to more accurate metabolite identification. (C) 2010 Elsevier B.V. All rights reserved.
Original languageEnglish
Pages (from-to)75-82
Number of pages8
JournalChemometrics and Intelligent Laboratory Systems
Issue number1
Publication statusPublished - 1 Nov 2010


  • Data mining
  • Metabolite identification
  • Mass spectrometry
  • Metabolomics
  • Mass error


Dive into the research topics of 'MI-Pack: Increased confidence of metabolite identification in mass spectra by integrating accurate masses and metabolic pathways'. Together they form a unique fingerprint.

Cite this