Ensembles and locality : insight on improving software effort estimation

Research output: Contribution to journalArticlepeer-review

82 Citations (Scopus)
196 Downloads (Pure)


Ensembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE).

We aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE.

A principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE.

Bagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets.

Combining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area.
Original languageEnglish
Pages (from-to)1512-1528
JournalInformation and Software Technology
Issue number8
Early online date12 Oct 2012
Publication statusPublished - 1 Aug 2013


  • Software effort estimation
  • Ensembles of learning machines
  • Locality
  • Empirical validation


Dive into the research topics of 'Ensembles and locality : insight on improving software effort estimation'. Together they form a unique fingerprint.

Cite this