Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

Research output: Contribution to journalArticlepeer-review

Authors

External organisations

  • Department of Environmental Sciences/Center of Excellence in Environmental Studies, King Abdulaziz University
  • Department of Geosciences Faculty of Science, University of Malta, Msida, MSD 2080, Malta
  • Department of Physics Faculty of Science, University of Malta, Msida, MSD 2080, Malta
  • ISGlobal, Barcelona Institute for Global Health - Campus MAR, Barcelona Biomedical Research Park (PRBB), Doctor Aiguader, 88, 08003 Barcelona, Spain

Abstract

Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.

Details

Original languageEnglish
JournalEnvironmental Science and Technology
Early online date31 Aug 2018
Publication statusE-pub ahead of print - 31 Aug 2018

Keywords

  • Benzene, personal exposure, machine learning techniques, general linear model, dimension reduction