Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

Noel J. Aquilina; Juana Maria Delgado Saborit; Stefano  Bugelli; Jason Padovani Ginies; Roy Harrison

doi:10.1021/acs.est.8b03328

Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

Noel J. Aquilina, Juana Maria Delgado Saborit, Stefano Bugelli, Jason Padovani Ginies, Roy Harrison

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

265 Downloads (Pure)

Abstract

Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.

Original language	English
Journal	Environmental Science and Technology
Early online date	31 Aug 2018
DOIs	https://doi.org/10.1021/acs.est.8b03328
Publication status	E-pub ahead of print - 31 Aug 2018

Keywords

Benzene
personal exposure
machine learning techniques
general linear model
dimension reduction

Access to Document

10.1021/acs.est.8b03328Licence: None: All rights reserved

Noel_J_Aquilina_et_al_Comparison_of_machine_learning_approaches_Environmental_Science_and_Technology_2018
Checked for eligibility: 26/09/2018 This document is the Accepted Manuscript version of a Published Work that appeared in final form in Environmental Science and Technology, copyright © American Chemical Society after peer review and technical editing by the publisher.
Accepted author manuscript, 774 KBLicence: Other (please specify with Rights Statement)

Cite this

@article{17939c2aa3e5464b8f71d8db71926109,

title = "Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene",

abstract = "Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations. ",

keywords = "Benzene, personal exposure, machine learning techniques, general linear model, dimension reduction",

author = "Aquilina, {Noel J.} and {Delgado Saborit}, {Juana Maria} and Stefano Bugelli and {Padovani Ginies}, Jason and Roy Harrison",

year = "2018",

month = aug,

day = "31",

doi = "10.1021/acs.est.8b03328",

language = "English",

journal = "Environmental Science and Technology",

issn = "0013-936X",

publisher = "American Chemical Society",

}

TY - JOUR

T1 - Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

AU - Aquilina, Noel J.

AU - Delgado Saborit, Juana Maria

AU - Bugelli, Stefano

AU - Padovani Ginies, Jason

AU - Harrison, Roy

PY - 2018/8/31

Y1 - 2018/8/31

N2 - Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.

AB - Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.

KW - Benzene

KW - personal exposure

KW - machine learning techniques

KW - general linear model

KW - dimension reduction

U2 - 10.1021/acs.est.8b03328

DO - 10.1021/acs.est.8b03328

M3 - Article

SN - 0013-936X

JO - Environmental Science and Technology

JF - Environmental Science and Technology

ER -

Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

Abstract

Keywords

Access to Document

Fingerprint

Cite this