Unsupervised methods in LC-MS data treatment: application for potential chemotaxonomic markers search

Polina Turova, Iain Styles, Vladimir Timashev, Konstantin Kravets, Alexander Grechnikov, Dmitry Lyskov, Tahir Samigullin, Ilya Podolskiy, Oleg Shpigun, Andrey Stavrianidi

Research output: Contribution to journalArticlepeer-review

22 Downloads (Pure)


The combination of Liquid Chromatography and Mass Spectrometry (LC-MS) is commonly used to determine and characterize biologically active compounds because of its high resolution and sensitivity. In this work we explore the interpretation of LC-MS data using multivariate statistical analysis algorithms to extract useful chemical information and identify clusters of similar samples. Samples of leaves from 19 plants belonging to the Apiaceae family were analyzed in unified LC conditions by high- and low-resolution mass spectrometry in a wide range scan mode. LC-MS data preprocessing was performed followed by statistical analysis using tensor decomposition in the form of Parallel Factor Analysis (PARAFAC); matrix factorization following tensor unfolding with principal component analysis (PCA), independent component analysis (ICA), non-negative matrix factorization (NMF); or unsupervised feature selection (UFS). The optimal number of components for each of these methods were found and results were compared using four different metrics: silhouette score, Davies-Bouldin index, computational time, number of noisy components. It was found that PCA, ICA and UFS give the best results across the majority of the criteria for both low- and high-resolution data. An algorithm for biomarker signal selection is suggested and 23 potential chemotaxonomic markers were tentatively identified using MS2 data. Dendrograms constructed by the methods were compared to the molecular phylogenic tree by calculating pixel-wise mean square error (MSE). Therefore, the suggested approach can support chemotaxonomic studies and yield valuable chemical information for biomarker discovery.
Original languageEnglish
Article number114382
Number of pages10
JournalJournal of Pharmaceutical and Biomedical Analysis
Early online date21 Sept 2021
Publication statusPublished - 30 Nov 2021


  • Apiaceae
  • Liquid chromatography
  • Machine learning
  • Mass spectrometry
  • Multi-way data


Dive into the research topics of 'Unsupervised methods in LC-MS data treatment: application for potential chemotaxonomic markers search'. Together they form a unique fingerprint.

Cite this