Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

L.L. MINKU; X. YAO

doi:10.1007/s10515-016-0209-7

Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

L.L. MINKU, X. YAO

Computer Science

Research output: Contribution to journal › Article › peer-review

16 Citations (Scopus)

Abstract

Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL’s behaviour is provided, strengthening its external validity.

Original language	English
Pages (from-to)	499-542
Number of pages	44
Journal	Automated Software Engineering
Volume	24
Issue number	3
DOIs	https://doi.org/10.1007/s10515-016-0209-7
Publication status	Published - 28 Dec 2016

Keywords

Model-based software effort estimation
Machine learning
Online learning
Non-stationary environments

Access to Document

10.1007/s10515-016-0209-7Licence: Creative Commons: Attribution (CC BY)

https://link.springer.com/article/10.1007%2Fs10515-016-0209-7Licence: Creative Commons: Attribution (CC BY)

Cite this

@article{aa9bfd23b33d450ea4d95cd858d1ea6e,

title = "Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models",

abstract = "Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL{\textquoteright}s behaviour is provided, strengthening its external validity.",

keywords = "Model-based software effort estimation, Machine learning, Online learning, Non-stationary environments",

author = "L.L. MINKU and X. YAO",

year = "2016",

month = dec,

day = "28",

doi = "10.1007/s10515-016-0209-7",

language = "English",

volume = "24",

pages = "499--542",

journal = "Automated Software Engineering",

issn = "0928-8910",

publisher = "Springer",

number = "3",

}

TY - JOUR

T1 - Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

AU - MINKU, L.L.

AU - YAO, X.

PY - 2016/12/28

Y1 - 2016/12/28

N2 - Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL’s behaviour is provided, strengthening its external validity.

AB - Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL’s behaviour is provided, strengthening its external validity.

KW - Model-based software effort estimation

KW - Machine learning

KW - Online learning

KW - Non-stationary environments

U2 - 10.1007/s10515-016-0209-7

DO - 10.1007/s10515-016-0209-7

M3 - Article

SN - 0928-8910

VL - 24

SP - 499

EP - 542

JO - Automated Software Engineering

JF - Automated Software Engineering

IS - 3

ER -

Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Abstract

Keywords

Access to Document

Fingerprint

Cite this