Benchmarking transformer-based models for medical record de-identification in a single center multi-specialty evaluation

  • Rachel Kuo*
  • , Andrew A.S. Soltan
  • , Ciaran O'Hanlon
  • , Alan Hasanic
  • , David A. Clifton
  • , Gary Collins
  • , Dominic Furniss
  • , David W. Eyre
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Protecting patient confidentiality is central to enabling research using electronic health records. Automated text de-identification offers a scalable alternative to manual redaction. However, different approaches vary in accuracy and adaptability. We evaluated four transformer-based, task-specific models and five large language models on 3,650 clinical records spanning general and specialty datasets from a UK hospital group. Records were dual-annotated by clinicians, allowing precise comparison of performance. The Microsoft Azure de-identification service achieved the highest F1 score, approaching clinician performance, while fine-tuned AnonCAT and GPT-4-0125 with few-shot prompting also performed strongly. Smaller LLMs frequently over-redacted or produced hallucinatory content, limiting interpretability. Task-specific models demonstrated greater stability across datasets, while low-level adaptation improved performance in both model classes. These findings highlight that automated de-identification systems can provide effective support for large-scale sharing of clinical records, but success depends on careful model choice, adaptation strategies, and safeguards to ensure robust data utility and privacy.

Original languageEnglish
Article number113732
Number of pages12
JournaliScience
Volume28
Issue number12
Early online date8 Oct 2025
DOIs
Publication statusPublished - 19 Dec 2025

Keywords

  • Artificial intelligence
  • Health informatics

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Benchmarking transformer-based models for medical record de-identification in a single center multi-specialty evaluation'. Together they form a unique fingerprint.

Cite this