Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

Noel J. Aquilina, Juana Maria Delgado Saborit, Stefano Bugelli, Jason Padovani Ginies, Roy Harrison

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)
266 Downloads (Pure)


Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.
Original languageEnglish
JournalEnvironmental Science and Technology
Early online date31 Aug 2018
Publication statusE-pub ahead of print - 31 Aug 2018


  • Benzene
  • personal exposure
  • machine learning techniques
  • general linear model
  • dimension reduction


Dive into the research topics of 'Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene'. Together they form a unique fingerprint.

Cite this