Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

Research output: Contribution to journalArticle

Standard

Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene. / Aquilina, Noel J.; Delgado Saborit, Juana Maria; Bugelli, Stefano ; Padovani Ginies, Jason; Harrison, Roy.

In: Environmental Science and Technology, 31.08.2018.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

Bibtex

@article{17939c2aa3e5464b8f71d8db71926109,
title = "Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene",
abstract = "Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations. ",
keywords = "Benzene, personal exposure, machine learning techniques, general linear model, dimension reduction",
author = "Aquilina, {Noel J.} and {Delgado Saborit}, {Juana Maria} and Stefano Bugelli and {Padovani Ginies}, Jason and Roy Harrison",
year = "2018",
month = aug,
day = "31",
doi = "10.1021/acs.est.8b03328",
language = "English",
journal = "Environmental Science and Technology",
issn = "0013-936X",
publisher = "American Chemical Society",

}

RIS

TY - JOUR

T1 - Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene

AU - Aquilina, Noel J.

AU - Delgado Saborit, Juana Maria

AU - Bugelli, Stefano

AU - Padovani Ginies, Jason

AU - Harrison, Roy

PY - 2018/8/31

Y1 - 2018/8/31

N2 - Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.

AB - Machine Learning Techniques (MLTs) offer great power in analysing complex datasets and have not previously been applied to non-occupational pollutant exposure. MLT models that can predict personal exposure to benzene have been developed and compared with a standard model using a linear regression approach (GLM). The models were tested against independent datasets obtained from three personal exposure measurement campaigns. A Correlation-based Feature Subset (CFS) selection algorithm identified a reduced attribute set, with common attributes grouped under the use of paints in homes; upholstery materials; space heating and environmental tobacco smoke as the attributes suitable to predict the personal exposure to benzene. Personal exposure was categorised as low, medium and high, and for big datasets, both the GLM and MLTs show high variability in performance to correctly classify >90%ile concentrations, but the MLT models have a higher score when accounting for divergence of incorrectly classified cases. Overall, the MLTs perform at least as well as the GLM and avoid the need to input microenvironment concentrations.

KW - Benzene

KW - personal exposure

KW - machine learning techniques

KW - general linear model

KW - dimension reduction

U2 - 10.1021/acs.est.8b03328

DO - 10.1021/acs.est.8b03328

M3 - Article

JO - Environmental Science and Technology

JF - Environmental Science and Technology

SN - 0013-936X

ER -