Reusing genetic programming for ensemble selection in classification of unbalanced data

Urvesh Bhowan; Mark Johnston; Mengjie Zhang; Xin Yao

doi:10.1109/TEVC.2013.2293393

Reusing genetic programming for ensemble selection in classification of unbalanced data

Urvesh Bhowan, Mark Johnston, Mengjie Zhang, Xin Yao

Research output: Contribution to journal › Article › peer-review

50 Citations (Scopus)

Abstract

Classification algorithms can suffer from performance degradation when the class distribution is unbalanced. This paper develops a two-step approach to evolving ensembles using genetic programming (GP) for unbalanced data. The first step uses multiobjective (MO) GP to evolve a Pareto-approximated front of GP classifiers to form the ensemble by trading-off the minority and the majority class against each other during learning. The MO component alleviates the reliance on sampling to artificially rebalance the data. The second step, which is the focus this paper, proposes a novel ensemble selection approach using GP to automatically find/choose the best individuals for the ensemble. This new GP approach combines multiple Pareto-approximated front members into a single composite genetic program solution to represent the (optimized) ensemble. This ensemble representation has two main advantages/novelties over traditional genetic algorithm (GA) approaches. First, by limiting the depth of the composite solution trees, we use selection pressure during evolution to find small highly-cooperative groups of individuals for the ensemble. This means that ensemble sizes are not fixed a priori (as in GA), but vary depending on the strength of the base learners. Second, we compare different function set operators in the composite solution trees to explore new ways to aggregate the member outputs and thus, control how the ensemble computes its output. We show that the proposed GP approach evolves smaller more diverse ensembles compared to an established ensemble selection algorithm, while still performing as well as, or better than the established approach. The evolved GP ensembles also perform well compared to other bagging and boosting approaches, particularly on tasks with high levels of class imbalance.

Original language	English
Article number	6677603
Pages (from-to)	893-908
Number of pages	16
Journal	IEEE Transactions on Evolutionary Computation
Volume	18
Issue number	6
DOIs	https://doi.org/10.1109/TEVC.2013.2293393
Publication status	Published - 1 Dec 2014

Keywords

Classification
ensemble machine learning
genetic programming
unbalanced data

ASJC Scopus subject areas

Software
Computational Theory and Mathematics
Theoretical Computer Science

Access to Document

10.1109/TEVC.2013.2293393

Cite this

@article{e07e751142ed4bc5a2d774f2db9a50c8,

title = "Reusing genetic programming for ensemble selection in classification of unbalanced data",

abstract = "Classification algorithms can suffer from performance degradation when the class distribution is unbalanced. This paper develops a two-step approach to evolving ensembles using genetic programming (GP) for unbalanced data. The first step uses multiobjective (MO) GP to evolve a Pareto-approximated front of GP classifiers to form the ensemble by trading-off the minority and the majority class against each other during learning. The MO component alleviates the reliance on sampling to artificially rebalance the data. The second step, which is the focus this paper, proposes a novel ensemble selection approach using GP to automatically find/choose the best individuals for the ensemble. This new GP approach combines multiple Pareto-approximated front members into a single composite genetic program solution to represent the (optimized) ensemble. This ensemble representation has two main advantages/novelties over traditional genetic algorithm (GA) approaches. First, by limiting the depth of the composite solution trees, we use selection pressure during evolution to find small highly-cooperative groups of individuals for the ensemble. This means that ensemble sizes are not fixed a priori (as in GA), but vary depending on the strength of the base learners. Second, we compare different function set operators in the composite solution trees to explore new ways to aggregate the member outputs and thus, control how the ensemble computes its output. We show that the proposed GP approach evolves smaller more diverse ensembles compared to an established ensemble selection algorithm, while still performing as well as, or better than the established approach. The evolved GP ensembles also perform well compared to other bagging and boosting approaches, particularly on tasks with high levels of class imbalance.",

keywords = "Classification, ensemble machine learning, genetic programming, unbalanced data",

author = "Urvesh Bhowan and Mark Johnston and Mengjie Zhang and Xin Yao",

year = "2014",

month = dec,

day = "1",

doi = "10.1109/TEVC.2013.2293393",

language = "English",

volume = "18",

pages = "893--908",

journal = "IEEE Transactions on Evolutionary Computation",

issn = "1089-778X",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "6",

}

TY - JOUR

T1 - Reusing genetic programming for ensemble selection in classification of unbalanced data

AU - Bhowan, Urvesh

AU - Johnston, Mark

AU - Zhang, Mengjie

AU - Yao, Xin

PY - 2014/12/1

Y1 - 2014/12/1

N2 - Classification algorithms can suffer from performance degradation when the class distribution is unbalanced. This paper develops a two-step approach to evolving ensembles using genetic programming (GP) for unbalanced data. The first step uses multiobjective (MO) GP to evolve a Pareto-approximated front of GP classifiers to form the ensemble by trading-off the minority and the majority class against each other during learning. The MO component alleviates the reliance on sampling to artificially rebalance the data. The second step, which is the focus this paper, proposes a novel ensemble selection approach using GP to automatically find/choose the best individuals for the ensemble. This new GP approach combines multiple Pareto-approximated front members into a single composite genetic program solution to represent the (optimized) ensemble. This ensemble representation has two main advantages/novelties over traditional genetic algorithm (GA) approaches. First, by limiting the depth of the composite solution trees, we use selection pressure during evolution to find small highly-cooperative groups of individuals for the ensemble. This means that ensemble sizes are not fixed a priori (as in GA), but vary depending on the strength of the base learners. Second, we compare different function set operators in the composite solution trees to explore new ways to aggregate the member outputs and thus, control how the ensemble computes its output. We show that the proposed GP approach evolves smaller more diverse ensembles compared to an established ensemble selection algorithm, while still performing as well as, or better than the established approach. The evolved GP ensembles also perform well compared to other bagging and boosting approaches, particularly on tasks with high levels of class imbalance.

AB - Classification algorithms can suffer from performance degradation when the class distribution is unbalanced. This paper develops a two-step approach to evolving ensembles using genetic programming (GP) for unbalanced data. The first step uses multiobjective (MO) GP to evolve a Pareto-approximated front of GP classifiers to form the ensemble by trading-off the minority and the majority class against each other during learning. The MO component alleviates the reliance on sampling to artificially rebalance the data. The second step, which is the focus this paper, proposes a novel ensemble selection approach using GP to automatically find/choose the best individuals for the ensemble. This new GP approach combines multiple Pareto-approximated front members into a single composite genetic program solution to represent the (optimized) ensemble. This ensemble representation has two main advantages/novelties over traditional genetic algorithm (GA) approaches. First, by limiting the depth of the composite solution trees, we use selection pressure during evolution to find small highly-cooperative groups of individuals for the ensemble. This means that ensemble sizes are not fixed a priori (as in GA), but vary depending on the strength of the base learners. Second, we compare different function set operators in the composite solution trees to explore new ways to aggregate the member outputs and thus, control how the ensemble computes its output. We show that the proposed GP approach evolves smaller more diverse ensembles compared to an established ensemble selection algorithm, while still performing as well as, or better than the established approach. The evolved GP ensembles also perform well compared to other bagging and boosting approaches, particularly on tasks with high levels of class imbalance.

KW - Classification

KW - ensemble machine learning

KW - genetic programming

KW - unbalanced data

UR - http://www.scopus.com/inward/record.url?scp=84914103027&partnerID=8YFLogxK

U2 - 10.1109/TEVC.2013.2293393

DO - 10.1109/TEVC.2013.2293393

M3 - Article

AN - SCOPUS:84914103027

SN - 1089-778X

VL - 18

SP - 893

EP - 908

JO - IEEE Transactions on Evolutionary Computation

JF - IEEE Transactions on Evolutionary Computation

IS - 6

M1 - 6677603

ER -

Reusing genetic programming for ensemble selection in classification of unbalanced data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this