Automating Large-scale Health Care Service Feedback Analysis: Sentiment Analysis and Topic Modeling Study

George Alexander, Mohammed Bahja, Gibran Farook Butt

Research output: Contribution to journalArticlepeer-review

85 Downloads (Pure)


BACKGROUND: Obtaining patient feedback is an essential mechanism for health care service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experiences. The Department of Health and Social Care in England via National Health Service Digital operates a patient feedback web service through which patients can leave feedback of their experiences in structured and free-text report forms. Free-text feedback, compared with structured questionnaires, may be less biased by the feedback collector and, thus, more representative; however, it is harder to analyze in large quantities and challenging to derive meaningful, quantitative outcomes.

OBJECTIVE: The aim of this study is to build a novel data analysis and interactive visualization pipeline accessible through an interactive web application to facilitate the interrogation of and provide unique insights into National Health Service patient feedback.

METHODS: This study details the development of a text analysis tool that uses contemporary natural language processing and machine learning models to analyze free-text clinical service reviews to develop a robust classification model and interactive visualization web application. The methodology is based on the design science research paradigm and was conducted in three iterations: a sentiment analysis of the patient feedback corpus in the first iteration, topic modeling (unigram and bigram)-based analysis for topic identification in the second iteration, and nested topic modeling in the third iteration that combines sentiment analysis and topic modeling methods. An interactive data visualization web application for use by the general public was then created, presenting the data on a geographic representation of the country, making it easily accessible.

RESULTS: Of the 11,103 possible clinical services that could be reviewed across England, 2030 (18.28%) different services received a combined total of 51,845 reviews between October 1, 2017, and September 30, 2019. Dominant topics were identified for the entire corpus followed by negative- and positive-sentiment topics in turn. Reviews containing high- and low-sentiment topics occurred more frequently than reviews containing less polarized topics. Time-series analysis identified trends in topic and sentiment occurrence frequency across the study period.

CONCLUSIONS: Using contemporary natural language processing techniques, unstructured text data were effectively characterized for further analysis and visualization. An efficient pipeline was successfully combined with a web application, making automated analysis and dissemination of large volumes of information accessible. This study represents a significant step in efforts to generate and visualize useful, actionable, and unique information from free-text patient reviews.

Original languageEnglish
Article numbere29385
JournalJMIR Medical Informatics
Issue number4
Publication statusPublished - 11 Apr 2022

Bibliographical note

©George Alexander, Mohammed Bahja, Gibran Farook Butt. Originally published in JMIR Medical Informatics (, 11.04.2022.


  • National Health Service
  • automated solutions
  • free-text
  • large-scale health service
  • latent Dirichlet allocation
  • natural language processing
  • patient feedback
  • reviews
  • topic modeling
  • unstructured data

ASJC Scopus subject areas

  • Health Informatics
  • Health Information Management


Dive into the research topics of 'Automating Large-scale Health Care Service Feedback Analysis: Sentiment Analysis and Topic Modeling Study'. Together they form a unique fingerprint.

Cite this