Abstract
Just-in-Time Software Defect Prediction (JIT-SDP) is an SDP approach that makes defect predictions at the software change level. Most existing JIT-SDP work assumes that the characteristics of the problem remain the same over time. However, JIT-SDP may suffer from class imbalance evolution. Specifically, the imbalance status of the problem (i.e., how much underrepresented the defect-inducing changes are) may be intensified or reduced over time. If occurring, this could render existing JIT-SDP approaches unsuitable, including those that rebuild classifiers over time using only recent data. This work thus provides the first investigation of whether class imbalance evolution poses a threat to JIT-SDP. This investigation is performed in a realistic scenario by taking into account verification latency – the often overlooked fact that labeled training examples arrive with a delay. Based on 10 GitHub projects, we show that JIT-SDP suffers from class imbalance evolution, significantly hindering the predictive performance of existing JIT-SDP approaches. Compared to state-of-the-art class imbalance evolution learning approaches, the predictive performance of JIT-SDP approaches was up to 97.2% lower in terms of g-mean. Hence, it is essential to tackle class imbalance evolution in JIT-SDP. We then propose a novel class imbalance evolution approach for the specific context of JIT-SDP. While maintaining top ranked g-means, this approach managed to produce up to 63.59% more balanced recalls on the defect-inducing and clean classes than state-of-theart class imbalance evolution approaches. We thus recommend it to avoid overemphasizing one class over the other in JIT-SDP.
Original language | English |
---|---|
Title of host publication | Proceedings of the 41st ACM/IEEE International Conference on Software Engineering (ICSE 2019) |
Publisher | IEEE Computer Society Press |
Pages | 666-676 |
Number of pages | 11 |
DOIs | |
Publication status | Published - 25 May 2019 |
Event | 41st ACM/IEEE International Conference on Software Engineering (ICSE 2019) - Montreal, Canada Duration: 25 May 2019 → 31 May 2019 |
Conference
Conference | 41st ACM/IEEE International Conference on Software Engineering (ICSE 2019) |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 25/05/19 → 31/05/19 |
Keywords
- Software defect prediction
- class imbalance
- concept drift
- ensembles
- online learning
- verification latency
ASJC Scopus subject areas
- Software