Interobserver variability studies in diagnostic imaging: a methodological systematic review

Laura Quinn; Konstantinos Tryposkiadis; Jon Deeks; Henrica C.W. De Vet; Sue Mallett; Lidwine B. Mokkink; Yemisi Takwoingi; Sian Taylor-Phillips; Alice Sitch

doi:10.1259/bjr.20220972

Interobserver variability studies in diagnostic imaging: a methodological systematic review

Laura Quinn^*, Konstantinos Tryposkiadis, Jon Deeks, Henrica C.W. De Vet, Sue Mallett, Lidwine B. Mokkink, Yemisi Takwoingi, Sian Taylor-Phillips, Alice Sitch

^*Corresponding author for this work

Applied Health Research

Research output: Contribution to journal › Review article › peer-review

55 Downloads (Pure)

Abstract

Objectives: To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.

Methods: Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.

Results: Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23–88), and observers were 4 (IQR:2–7), with sample size justified in 12 (15%) studies. Most studies used static images (n = 75, 95%), where all observers interpreted images for all patients (n = 67, 85%). Intraclass correlation coefficients (ICC) (n = 41, 52%), Kappa (κ) statistics (n = 31, 39%) and percentage agreement (n = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.

Conclusions: Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored ‘not applicable’ when static images were used.

Advances in knowledge: The sample size for both patients and observers was often small without justification.

For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design.

Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.

Original language	English
Article number	20220972
Journal	The British Journal of Radiology
Volume	96
Issue number	1148
Early online date	29 Jun 2023
DOIs	https://doi.org/10.1259/bjr.20220972
Publication status	Published - Aug 2023

Access to Document

10.1259/bjr.20220972Licence: Creative Commons: Attribution (CC BY)

QuinnL2023InterobserverFinal published version, 547 KBLicence: Creative Commons: Attribution (CC BY)

Evaluation of diagnostic imaging test performance: Including interobserver variability and time to diagnosis
Quinn, L.
NIHR
1/04/20 → 31/03/26
Project: Other Government Departments

Cite this

@article{cf80c437b5df467a85100591cd5e11ac,

title = "Interobserver variability studies in diagnostic imaging: a methodological systematic review",

abstract = "Objectives: To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.Methods: Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.Results: Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23–88), and observers were 4 (IQR:2–7), with sample size justified in 12 (15%) studies. Most studies used static images (n = 75, 95%), where all observers interpreted images for all patients (n = 67, 85%). Intraclass correlation coefficients (ICC) (n = 41, 52%), Kappa (κ) statistics (n = 31, 39%) and percentage agreement (n = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.Conclusions: Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored {\textquoteleft}not applicable{\textquoteright} when static images were used.Advances in knowledge: The sample size for both patients and observers was often small without justification.For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design.Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.",

author = "Laura Quinn and Konstantinos Tryposkiadis and Jon Deeks and Vet, {Henrica C.W. De} and Sue Mallett and Mokkink, {Lidwine B.} and Yemisi Takwoingi and Sian Taylor-Phillips and Alice Sitch",

year = "2023",

month = aug,

doi = "10.1259/bjr.20220972",

language = "English",

volume = "96",

journal = "The British Journal of Radiology",

number = "1148",

}

TY - JOUR

T1 - Interobserver variability studies in diagnostic imaging

T2 - a methodological systematic review

AU - Quinn, Laura

AU - Tryposkiadis, Konstantinos

AU - Deeks, Jon

AU - Vet, Henrica C.W. De

AU - Mallett, Sue

AU - Mokkink, Lidwine B.

AU - Takwoingi, Yemisi

AU - Taylor-Phillips, Sian

AU - Sitch, Alice

PY - 2023/8

Y1 - 2023/8

N2 - Objectives: To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.Methods: Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.Results: Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23–88), and observers were 4 (IQR:2–7), with sample size justified in 12 (15%) studies. Most studies used static images (n = 75, 95%), where all observers interpreted images for all patients (n = 67, 85%). Intraclass correlation coefficients (ICC) (n = 41, 52%), Kappa (κ) statistics (n = 31, 39%) and percentage agreement (n = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.Conclusions: Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored ‘not applicable’ when static images were used.Advances in knowledge: The sample size for both patients and observers was often small without justification.For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design.Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.

AB - Objectives: To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.Methods: Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.Results: Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23–88), and observers were 4 (IQR:2–7), with sample size justified in 12 (15%) studies. Most studies used static images (n = 75, 95%), where all observers interpreted images for all patients (n = 67, 85%). Intraclass correlation coefficients (ICC) (n = 41, 52%), Kappa (κ) statistics (n = 31, 39%) and percentage agreement (n = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.Conclusions: Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored ‘not applicable’ when static images were used.Advances in knowledge: The sample size for both patients and observers was often small without justification.For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design.Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.

U2 - 10.1259/bjr.20220972

DO - 10.1259/bjr.20220972

M3 - Review article

C2 - 37399082

VL - 96

JO - The British Journal of Radiology

JF - The British Journal of Radiology

IS - 1148

M1 - 20220972

ER -

Interobserver variability studies in diagnostic imaging: a methodological systematic review

Abstract

Access to Document

Fingerprint

Projects

Evaluation of diagnostic imaging test performance: Including interobserver variability and time to diagnosis

Cite this