Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach

Steve Halligan; Douglas G Altman; Susan Mallett

doi:10.1007/s00330-014-3487-0

Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach

Steve Halligan, Douglas G Altman, Susan Mallett

Applied Health Research

Research output: Contribution to journal › Article › peer-review

71 Citations (Scopus)

Abstract

OBJECTIVES: The objectives are to describe the disadvantages of the area under the receiver operating characteristic curve (ROC AUC) to measure diagnostic test performance and to propose an alternative based on net benefit.

METHODS: We use a narrative review supplemented by data from a study of computer-assisted detection for CT colonography.

RESULTS: We identified problems with ROC AUC. Confidence scoring by readers was highly non-normal, and score distribution was bimodal. Consequently, ROC curves were highly extrapolated with AUC mostly dependent on areas without patient data. AUC depended on the method used for curve fitting. ROC AUC does not account for prevalence or different misclassification costs arising from false-negative and false-positive diagnoses. Change in ROC AUC has little direct clinical meaning for clinicians. An alternative analysis based on net benefit is proposed, based on the change in sensitivity and specificity at clinically relevant thresholds. Net benefit incorporates estimates of prevalence and misclassification costs, and it is clinically interpretable since it reflects changes in correct and incorrect diagnoses when a new diagnostic test is introduced.

CONCLUSIONS: ROC AUC is most useful in the early stages of test assessment whereas methods based on net benefit are more useful to assess radiological tests where the clinical context is known. Net benefit is more useful for assessing clinical impact.

KEY POINTS: • The area under the receiver operating characteristic curve (ROC AUC) measures diagnostic accuracy. • Confidence scores used to build ROC curves may be difficult to assign. • False-positive and false-negative diagnoses have different misclassification costs. • Excessive ROC curve extrapolation is undesirable. • Net benefit methods may provide more meaningful and clinically interpretable results than ROC AUC.

Original language	English
Pages (from-to)	932-9
Number of pages	8
Journal	European Radiology
Volume	25
Issue number	4
DOIs	https://doi.org/10.1007/s00330-014-3487-0
Publication status	Published - Apr 2015

Access to Document

10.1007/s00330-014-3487-0

Cite this

@article{d0411304bb5344b688ca7310748b9f9a,

title = "Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach",

abstract = "OBJECTIVES: The objectives are to describe the disadvantages of the area under the receiver operating characteristic curve (ROC AUC) to measure diagnostic test performance and to propose an alternative based on net benefit.METHODS: We use a narrative review supplemented by data from a study of computer-assisted detection for CT colonography.RESULTS: We identified problems with ROC AUC. Confidence scoring by readers was highly non-normal, and score distribution was bimodal. Consequently, ROC curves were highly extrapolated with AUC mostly dependent on areas without patient data. AUC depended on the method used for curve fitting. ROC AUC does not account for prevalence or different misclassification costs arising from false-negative and false-positive diagnoses. Change in ROC AUC has little direct clinical meaning for clinicians. An alternative analysis based on net benefit is proposed, based on the change in sensitivity and specificity at clinically relevant thresholds. Net benefit incorporates estimates of prevalence and misclassification costs, and it is clinically interpretable since it reflects changes in correct and incorrect diagnoses when a new diagnostic test is introduced.CONCLUSIONS: ROC AUC is most useful in the early stages of test assessment whereas methods based on net benefit are more useful to assess radiological tests where the clinical context is known. Net benefit is more useful for assessing clinical impact.KEY POINTS: • The area under the receiver operating characteristic curve (ROC AUC) measures diagnostic accuracy. • Confidence scores used to build ROC curves may be difficult to assign. • False-positive and false-negative diagnoses have different misclassification costs. • Excessive ROC curve extrapolation is undesirable. • Net benefit methods may provide more meaningful and clinically interpretable results than ROC AUC.",

author = "Steve Halligan and Altman, {Douglas G} and Susan Mallett",

year = "2015",

month = apr,

doi = "10.1007/s00330-014-3487-0",

language = "English",

volume = "25",

pages = "932--9",

journal = "European Radiology",

issn = "0938-7994",

publisher = "Springer",

number = "4",

}

TY - JOUR

T1 - Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests

T2 - a discussion and proposal for an alternative approach

AU - Halligan, Steve

AU - Altman, Douglas G

AU - Mallett, Susan

PY - 2015/4

Y1 - 2015/4

N2 - OBJECTIVES: The objectives are to describe the disadvantages of the area under the receiver operating characteristic curve (ROC AUC) to measure diagnostic test performance and to propose an alternative based on net benefit.METHODS: We use a narrative review supplemented by data from a study of computer-assisted detection for CT colonography.RESULTS: We identified problems with ROC AUC. Confidence scoring by readers was highly non-normal, and score distribution was bimodal. Consequently, ROC curves were highly extrapolated with AUC mostly dependent on areas without patient data. AUC depended on the method used for curve fitting. ROC AUC does not account for prevalence or different misclassification costs arising from false-negative and false-positive diagnoses. Change in ROC AUC has little direct clinical meaning for clinicians. An alternative analysis based on net benefit is proposed, based on the change in sensitivity and specificity at clinically relevant thresholds. Net benefit incorporates estimates of prevalence and misclassification costs, and it is clinically interpretable since it reflects changes in correct and incorrect diagnoses when a new diagnostic test is introduced.CONCLUSIONS: ROC AUC is most useful in the early stages of test assessment whereas methods based on net benefit are more useful to assess radiological tests where the clinical context is known. Net benefit is more useful for assessing clinical impact.KEY POINTS: • The area under the receiver operating characteristic curve (ROC AUC) measures diagnostic accuracy. • Confidence scores used to build ROC curves may be difficult to assign. • False-positive and false-negative diagnoses have different misclassification costs. • Excessive ROC curve extrapolation is undesirable. • Net benefit methods may provide more meaningful and clinically interpretable results than ROC AUC.

AB - OBJECTIVES: The objectives are to describe the disadvantages of the area under the receiver operating characteristic curve (ROC AUC) to measure diagnostic test performance and to propose an alternative based on net benefit.METHODS: We use a narrative review supplemented by data from a study of computer-assisted detection for CT colonography.RESULTS: We identified problems with ROC AUC. Confidence scoring by readers was highly non-normal, and score distribution was bimodal. Consequently, ROC curves were highly extrapolated with AUC mostly dependent on areas without patient data. AUC depended on the method used for curve fitting. ROC AUC does not account for prevalence or different misclassification costs arising from false-negative and false-positive diagnoses. Change in ROC AUC has little direct clinical meaning for clinicians. An alternative analysis based on net benefit is proposed, based on the change in sensitivity and specificity at clinically relevant thresholds. Net benefit incorporates estimates of prevalence and misclassification costs, and it is clinically interpretable since it reflects changes in correct and incorrect diagnoses when a new diagnostic test is introduced.CONCLUSIONS: ROC AUC is most useful in the early stages of test assessment whereas methods based on net benefit are more useful to assess radiological tests where the clinical context is known. Net benefit is more useful for assessing clinical impact.KEY POINTS: • The area under the receiver operating characteristic curve (ROC AUC) measures diagnostic accuracy. • Confidence scores used to build ROC curves may be difficult to assign. • False-positive and false-negative diagnoses have different misclassification costs. • Excessive ROC curve extrapolation is undesirable. • Net benefit methods may provide more meaningful and clinically interpretable results than ROC AUC.

U2 - 10.1007/s00330-014-3487-0

DO - 10.1007/s00330-014-3487-0

M3 - Article

C2 - 25599932

SN - 0938-7994

VL - 25

SP - 932

EP - 939

JO - European Radiology

JF - European Radiology

IS - 4

ER -

Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach

Abstract

Access to Document

Fingerprint

Cite this