A Method for Reducing Training Time of ML-Based Cascade Scheme for Large-Volume Data Analysis

  • Ivan Izonin*
  • , Roman Muzyka
  • , Roman Tkachenko
  • , Ivanna Dronyuk
  • , Kyrylo Yemets
  • , Stergios-Aristoteles Mitoulis
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We live in the era of large data analysis, where processing vast datasets has become essential for uncovering valuable insights across various domains of our lives. Machine learning (ML) algorithms offer powerful tools for processing and analyzing this abundance of information. However, the considerable time and computational resources needed for training ML models pose significant challenges, especially within cascade schemes, due to the iterative nature of training algorithms, the complexity of feature extraction and transformation processes, and the large sizes of the datasets involved. This paper proposes a modification to the existing ML-based cascade scheme for analyzing large biomedical datasets by incorporating principal component analysis (PCA) at each level of the cascade. We selected the number of principal components to replace the initial inputs so that it ensured 95% variance retention. Furthermore, we enhanced the training and application algorithms and demonstrated the effectiveness of the modified cascade scheme through comparative analysis, which showcased a significant reduction in training time while improving the generalization properties of the method and the accuracy of the large data analysis. The improved enhanced generalization properties of the scheme stemmed from the reduction in nonsignificant independent attributes in the dataset, which further enhanced its performance in intelligent large data analysis.

Original languageEnglish
Article number4762
Number of pages14
JournalSensors
Volume24
Issue number15
Early online date23 Jul 2024
DOIs
Publication statusPublished - Aug 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 by the authors.

Keywords

  • cascade scheme
  • Kolmogorov–Gabor polynomial
  • large data analysis
  • machine learning
  • PCA
  • training time

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Atomic and Molecular Physics, and Optics
  • Biochemistry
  • Instrumentation
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Method for Reducing Training Time of ML-Based Cascade Scheme for Large-Volume Data Analysis'. Together they form a unique fingerprint.

Cite this