The potential benefit of relevance vector machine to software effort estimation

Liyan Song; Leandro L. Minku; Xin Yao

doi:10.1145/2639490.2639510

The potential benefit of relevance vector machine to software effort estimation

Liyan Song, Leandro L. Minku, Xin Yao

Computer Science

Research output: Contribution to conference (unpublished) › Other › peer-review

8 Citations (Scopus)

127 Downloads (Pure)

Abstract

Three key challenges faced by the task of software effort estimation (SEE) when using predictive models are: (1) in order to support decision-making, software managers should have access not only to the effort estimation given by the predictive model, but also how confident this model is in estimating a given project and how likely other effort values could be the real efforts required to develop this project, (2) SEE data is likely to contain noise, due to the participation of humans in the data collection, and this noise can hinder predictions if not catered, and (3) data collection is an expensive task, and guidelines on when new data need to be collected would be helpful for reducing the cost associated with data collection. However, even though SEE has been studied for decades and many predictors have been proposed, few methods focus on these issues. In this work, we show that relevance vector machine (RVM) is a promising predictive method for addressing these three challenges. More specifically, it explicitly handles noise, it provides probabilistic predictions of effort, and can be used to identify when the required efforts of new projects should be collected for using them as training examples. With that in mind, this work provides the first step in exploiting RVM's potential for SEE by validating both its point prediction and prediction intervals. It then explains in detail future directions in terms of how RVMs can be further exploited for addressing the above mentioned challenges. Our systematic experiments show that RVM is very competitive compared with state-of-the-art SEE approaches, being usually ranked the first or second in 7 across 11 data sets in terms of mean absolute error. We also demonstrate how RVM can be used to judge the amount of noise present in the data. In summary, we show that RVM is a very promising predictor for SEE and should be further exploited.

Original language	English
Pages	52-61
DOIs	https://doi.org/10.1145/2639490.2639510
Publication status	Published - 17 Sept 2014
Event	the 10th International Conference - Turin, Italy, United Kingdom Duration: 17 Sept 2014 → 17 Sept 2014

Conference

Conference	the 10th International Conference
Country/Territory	United Kingdom
Period	17/09/14 → 17/09/14

Keywords

Software Effort Estimation
Machine Learning
Prediction Interval
Relevance Vector Machine
Data Collection Guidance
Effort Noise

Access to Document

10.1145/2639490.2639510

Song_Minku_Yao_Potential_benefit_relevance_Proceedings_10th_Int_Conf_2015
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for thirdparty components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). Eligibility for repository: checked 24/02/2015
Final published version, 684 KBLicence: Other (please specify with Rights Statement)

http://dl.acm.org/citation.cfm?doid=2639490.2639510

Cite this

@conference{6554924f037e475bbd58c337973623c2,

title = "The potential benefit of relevance vector machine to software effort estimation",

abstract = "Three key challenges faced by the task of software effort estimation (SEE) when using predictive models are: (1) in order to support decision-making, software managers should have access not only to the effort estimation given by the predictive model, but also how confident this model is in estimating a given project and how likely other effort values could be the real efforts required to develop this project, (2) SEE data is likely to contain noise, due to the participation of humans in the data collection, and this noise can hinder predictions if not catered, and (3) data collection is an expensive task, and guidelines on when new data need to be collected would be helpful for reducing the cost associated with data collection. However, even though SEE has been studied for decades and many predictors have been proposed, few methods focus on these issues. In this work, we show that relevance vector machine (RVM) is a promising predictive method for addressing these three challenges. More specifically, it explicitly handles noise, it provides probabilistic predictions of effort, and can be used to identify when the required efforts of new projects should be collected for using them as training examples. With that in mind, this work provides the first step in exploiting RVM's potential for SEE by validating both its point prediction and prediction intervals. It then explains in detail future directions in terms of how RVMs can be further exploited for addressing the above mentioned challenges. Our systematic experiments show that RVM is very competitive compared with state-of-the-art SEE approaches, being usually ranked the first or second in 7 across 11 data sets in terms of mean absolute error. We also demonstrate how RVM can be used to judge the amount of noise present in the data. In summary, we show that RVM is a very promising predictor for SEE and should be further exploited.",

keywords = "Software Effort Estimation, Machine Learning, Prediction Interval, Relevance Vector Machine, Data Collection Guidance, Effort Noise",

author = "Liyan Song and Minku, {Leandro L.} and Xin Yao",

year = "2014",

month = sep,

day = "17",

doi = "10.1145/2639490.2639510",

language = "English",

pages = "52--61",

note = "the 10th International Conference ; Conference date: 17-09-2014 Through 17-09-2014",

}

TY - CONF

T1 - The potential benefit of relevance vector machine to software effort estimation

AU - Song, Liyan

AU - Minku, Leandro L.

AU - Yao, Xin

PY - 2014/9/17

Y1 - 2014/9/17

N2 - Three key challenges faced by the task of software effort estimation (SEE) when using predictive models are: (1) in order to support decision-making, software managers should have access not only to the effort estimation given by the predictive model, but also how confident this model is in estimating a given project and how likely other effort values could be the real efforts required to develop this project, (2) SEE data is likely to contain noise, due to the participation of humans in the data collection, and this noise can hinder predictions if not catered, and (3) data collection is an expensive task, and guidelines on when new data need to be collected would be helpful for reducing the cost associated with data collection. However, even though SEE has been studied for decades and many predictors have been proposed, few methods focus on these issues. In this work, we show that relevance vector machine (RVM) is a promising predictive method for addressing these three challenges. More specifically, it explicitly handles noise, it provides probabilistic predictions of effort, and can be used to identify when the required efforts of new projects should be collected for using them as training examples. With that in mind, this work provides the first step in exploiting RVM's potential for SEE by validating both its point prediction and prediction intervals. It then explains in detail future directions in terms of how RVMs can be further exploited for addressing the above mentioned challenges. Our systematic experiments show that RVM is very competitive compared with state-of-the-art SEE approaches, being usually ranked the first or second in 7 across 11 data sets in terms of mean absolute error. We also demonstrate how RVM can be used to judge the amount of noise present in the data. In summary, we show that RVM is a very promising predictor for SEE and should be further exploited.

AB - Three key challenges faced by the task of software effort estimation (SEE) when using predictive models are: (1) in order to support decision-making, software managers should have access not only to the effort estimation given by the predictive model, but also how confident this model is in estimating a given project and how likely other effort values could be the real efforts required to develop this project, (2) SEE data is likely to contain noise, due to the participation of humans in the data collection, and this noise can hinder predictions if not catered, and (3) data collection is an expensive task, and guidelines on when new data need to be collected would be helpful for reducing the cost associated with data collection. However, even though SEE has been studied for decades and many predictors have been proposed, few methods focus on these issues. In this work, we show that relevance vector machine (RVM) is a promising predictive method for addressing these three challenges. More specifically, it explicitly handles noise, it provides probabilistic predictions of effort, and can be used to identify when the required efforts of new projects should be collected for using them as training examples. With that in mind, this work provides the first step in exploiting RVM's potential for SEE by validating both its point prediction and prediction intervals. It then explains in detail future directions in terms of how RVMs can be further exploited for addressing the above mentioned challenges. Our systematic experiments show that RVM is very competitive compared with state-of-the-art SEE approaches, being usually ranked the first or second in 7 across 11 data sets in terms of mean absolute error. We also demonstrate how RVM can be used to judge the amount of noise present in the data. In summary, we show that RVM is a very promising predictor for SEE and should be further exploited.

KW - Software Effort Estimation

KW - Machine Learning

KW - Prediction Interval

KW - Relevance Vector Machine

KW - Data Collection Guidance

KW - Effort Noise

U2 - 10.1145/2639490.2639510

DO - 10.1145/2639490.2639510

M3 - Other

SP - 52

EP - 61

T2 - the 10th International Conference

Y2 - 17 September 2014 through 17 September 2014

ER -

The potential benefit of relevance vector machine to software effort estimation

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this