Investigating health-related features and their impact on the prediction of diabetes using machine learning

Hafiz Farooq Ahmad, Hamid Mukhtar, Hesham Alaqail, Mohamed Seliaman, Abdulaziz Alhumam

Research output: Contribution to journalArticlepeer-review

41 Downloads (Pure)


Diabetes Mellitus (DM) is one of the most common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of its medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient with only a handful of features can allow cost-effective, rapid, and widely-available screening of diabetes, thereby lessening the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic patients and compare the role of HbA1c and FPG as input features. By using five different machine learning classifiers, and using feature elimination through feature permutation and hierarchical clustering, we established good performance for accuracy, precision, recall, and F1-score of the models on the dataset implying that our data or features are not bound to specific models. In addition, the consistent performance across all the evaluation metrics indicate that there was no trade-off or penalty among the evaluation metrics. Further analysis was performed on the data to identify the risk factors and their indirect impact on diabetes classification. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing analysis of the disease using selected features, important factors specific to the Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learned from this research.
Original languageEnglish
Article number1173
Number of pages18
JournalApplied Sciences
Issue number3
Publication statusPublished - 27 Jan 2021


  • machine learning
  • prediction
  • feature importance
  • feature elimination
  • hierarchical clustering


Dive into the research topics of 'Investigating health-related features and their impact on the prediction of diabetes using machine learning'. Together they form a unique fingerprint.

Cite this