OBJECTIVE: The quality of clinical care is often assessed by retrospective examination of case-notes (charts, medical records). Our objective was to determine the inter-rater reliability of case-note audit. METHODS: We conducted a systematic review of the inter-rater reliability of case-note audit. Analysis was restricted to 26 papers reporting comparisons of two or three raters making independent judgements about the quality of care. RESULTS: Sixty-six separate comparisons were possible, since some papers reported more than one measurement of reliability. Mean kappa values ranged from 0.32 to 0.70. These may be inflated due to publication bias. Measured reliabilities were found to be higher for case-note reviews based on explicit, as opposed to implicit, criteria and for reviews that focused on outcome (including adverse effects) rather than process errors. We found an association between kappa and the prevalence of errors (poor quality care), suggesting alternatives such as tetrachoric and polychoric correlation coefficients be considered to assess inter-rater reliability. CONCLUSIONS: Comparative studies should take into account the relationship between kappa and the prevalence of the events being measured.