Ensembles and locality : insight on improving software effort estimation

Leandro Minku; Xin Yao

doi:10.1016/j.infsof.2012.09.012

Ensembles and locality : insight on improving software effort estimation

Leandro Minku, Xin Yao

Computer Science

Research output: Contribution to journal › Article › peer-review

82 Citations (Scopus)

212 Downloads (Pure)

Abstract

Context
Ensembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE).

Objectives
We aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE.

Method
A principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE.

Results
Bagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets.

Conclusion
Combining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area.

Original language	English
Pages (from-to)	1512-1528
Journal	Information and Software Technology
Volume	55
Issue number	8
Early online date	12 Oct 2012
DOIs	https://doi.org/10.1016/j.infsof.2012.09.012
Publication status	Published - 1 Aug 2013

Keywords

Software effort estimation
Ensembles of learning machines
Locality
Empirical validation

Access to Document

10.1016/j.infsof.2012.09.012

Minku_Ensembles_locality_Information_Software_Technology_2013
Eligibility for repository : checked 30/06/2014
Final published version, 382 KBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

http://dx.doi.org/10.1016/j.infsof.2012.09.012

SEBASE: Software Engineered By Automated SEarch
Yao, X.
Engineering & Physical Science Research Council
29/06/06 → 28/12/11
Project: Research Councils

Cite this

@article{8ea9108dd11c464582ac1899fbfb4c13,

title = "Ensembles and locality : insight on improving software effort estimation",

abstract = "ContextEnsembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE).ObjectivesWe aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE.MethodA principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE.ResultsBagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets.ConclusionCombining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area.",

keywords = "Software effort estimation, Ensembles of learning machines, Locality, Empirical validation",

author = "Leandro Minku and Xin Yao",

year = "2013",

month = aug,

day = "1",

doi = "10.1016/j.infsof.2012.09.012",

language = "English",

volume = "55",

pages = "1512--1528",

journal = "Information and Software Technology",

issn = "0950-5849",

publisher = "Elsevier",

number = "8",

}

TY - JOUR

T1 - Ensembles and locality : insight on improving software effort estimation

AU - Minku, Leandro

AU - Yao, Xin

PY - 2013/8/1

Y1 - 2013/8/1

N2 - ContextEnsembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE).ObjectivesWe aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE.MethodA principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE.ResultsBagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets.ConclusionCombining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area.

AB - ContextEnsembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE).ObjectivesWe aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE.MethodA principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE.ResultsBagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets.ConclusionCombining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area.

KW - Software effort estimation

KW - Ensembles of learning machines

KW - Locality

KW - Empirical validation

U2 - 10.1016/j.infsof.2012.09.012

DO - 10.1016/j.infsof.2012.09.012

M3 - Article

SN - 0950-5849

VL - 55

SP - 1512

EP - 1528

JO - Information and Software Technology

JF - Information and Software Technology

IS - 8

ER -

Ensembles and locality : insight on improving software effort estimation

Abstract

Keywords

Access to Document

Fingerprint

Projects

SEBASE: Software Engineered By Automated SEarch

Cite this