Do the methods used to analyse missing data really matter?: An examination of data from an observational study of Intermediate Care patients

Billingsley Kaambwa; Stirling Bryan; Lucinda Billingham

doi:10.1186/1756-0500-5-330

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients

Billingsley Kaambwa, Stirling Bryan, Lucinda Billingham

Research output: Contribution to journal › Article › peer-review

11 Citations (Scopus)

149 Downloads (Pure)

Abstract

Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option-but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random-MCAR), multiple imputation (assuming missing at random-MAR) and Heckman selection model (assuming missing not at random-MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.

Original language	English
Article number	330
Journal	BMC Research Notes
Volume	5
DOIs	https://doi.org/10.1186/1756-0500-5-330
Publication status	Published - 27 Jun 2012

Access to Document

10.1186/1756-0500-5-330

Kaambwa_Methods_Missing_Data_BMC_Research_Notes_2012
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Checked July 2015
Final published version, 343 KBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{516ecbcd45e54ff794e23fd88c0e5699,

title = "Do the methods used to analyse missing data really matter?: An examination of data from an observational study of Intermediate Care patients",

abstract = "Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option-but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random-MCAR), multiple imputation (assuming missing at random-MAR) and Heckman selection model (assuming missing not at random-MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.",

author = "Billingsley Kaambwa and Stirling Bryan and Lucinda Billingham",

year = "2012",

month = jun,

day = "27",

doi = "10.1186/1756-0500-5-330",

language = "English",

volume = "5",

journal = "BMC Research Notes",

publisher = "Springer",

}

TY - JOUR

T1 - Do the methods used to analyse missing data really matter?

T2 - An examination of data from an observational study of Intermediate Care patients

AU - Kaambwa, Billingsley

AU - Bryan, Stirling

AU - Billingham, Lucinda

PY - 2012/6/27

Y1 - 2012/6/27

N2 - Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option-but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random-MCAR), multiple imputation (assuming missing at random-MAR) and Heckman selection model (assuming missing not at random-MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.

AB - Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option-but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random-MCAR), multiple imputation (assuming missing at random-MAR) and Heckman selection model (assuming missing not at random-MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.

U2 - 10.1186/1756-0500-5-330

DO - 10.1186/1756-0500-5-330

M3 - Article

C2 - 22738344

VL - 5

JO - BMC Research Notes

JF - BMC Research Notes

M1 - 330

ER -

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients

Abstract

Access to Document

Fingerprint

Cite this