A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation

Research output: Contribution to journalArticle

Authors

Colleges, School and Institutes

Abstract

Software effort estimation is an online supervised learning problem, where new training projects may become available over time. In this scenario, the Cross-Company (CC) approach Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving their collection cost. However, Dycom requires CC projects to be split into subsets. Both the number and composition of such subsets can affect Dycom's predictive performance. Even though clustering methods could be used to automatically create CC subsets, there are no procedures for automatically tuning the number of clusters over time in online supervised scenarios. This paper proposes the first procedure for that. An investigation of Dycom using six clustering methods and three automated tuning procedures is performed, to check whether clustering with automated tuning can create well performing CC splits. A case study with the ISBSG Repository shows that the proposed tuning procedure in combination with a simple threshold-based clustering method is the most successful in enabling Dycom to drastically reduce (by a factor of 10) the number of required WC training projects, while maintaining (or even improving) predictive performance in comparison with a corresponding WC model. A detailed analysis is provided to understand the conditions under which this approach does or does not work well. Overall, the proposed online supervised tuning procedure was generally successful in enabling a very simple threshold-based clustering approach to obtain the most competitive Dycom results. This demonstrates the value of automatically tuning hyperparameters over time in a supervised way.

Details

Original languageEnglish
Pages (from-to)3153-3204
Number of pages55
JournalEmpirical Software Engineering
Volume24
Issue number5
Early online date26 Feb 2019
Publication statusPublished - Oct 2019

Keywords

  • Software effort estimation, cross-company learning, transfer learning, concept drift, online learning, hyperparameter tuning