Multi-stream online transfer learning for software effort estimation: is it necessary?

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Authors

Colleges, School and Institutes

Abstract

Software Effort Estimation (SEE) may suffer from changes in the relationship between features describing software projects and their required effort over time, hindering predictive performance of machine learning models. To cope with that, most machine learning-based SEE approaches rely on receiving a large number of Within-Company (WC) projects for training over time, being prohibitively expensive. The approach Dycom reduces the number of required WC training projects by transferring knowledge from Cross-Company (CC) projects. However, it assumes that CC projects have no chronology and are entirely available before WC projects start being estimated. Given the importance of taking chronology into account to cope with changes, it may be beneficial to also take the chronology of CC projects into account. This paper thus investigates whether and under what circumstances treating CC projects as multiple data streams to be learned over time may be useful for improving SEE. For that, an extension of Dycom called OATES is proposed to enable multi-stream online learning, so that both incoming WC and CC data streams can be learnt over time. OATES is then compared against Dycom and five other approaches on a case study using four different scenarios derived from the ISBSG Repository. The results show that OATES improved predictive performance over the state-of-the-art when the number of CC projects available beforehand was small. Learning CC projects over time as multiple data streams is thus recommended for improving SEE in such scenario. When the number of CC projects available beforehand was large, OATES obtained similar predictive performance to the state-of-the-art. Therefore, CC data streams are unnecessary in this scenario, but are not detrimental either.

Bibliographic note

Funding Information: This work was supported by EPSRC Grant No. EP/R006660/2. Publisher Copyright: © 2021 ACM.

Details

Original languageEnglish
Title of host publicationPROMISE 2021
Subtitle of host publicationProceedings of the 17th ACM International Conference on Predictive Models and Data Analytics in Software Engineering
EditorsShane McIntosh, Xin Xia, Sousuke Amasaki
Publication statusPublished - 19 Aug 2021
EventThe 17th International Conference on Predictive Models and Data Analytics in Software Engineering - Athens, Greece
Duration: 19 Aug 202120 Aug 2021

Publication series

NamePROMISE: Predictor Models in Software Engineering

Conference

ConferenceThe 17th International Conference on Predictive Models and Data Analytics in Software Engineering
Abbreviated titlePROMISE'21
Country/TerritoryGreece
CityAthens
Period19/08/2120/08/21

Keywords

  • Software effort estimation, concept drift, cross-company learning, data streams, ensembles, transfer learning