Automating Large-scale Health Care Service Feedback Analysis: Sentiment Analysis and Topic Modeling Study

George Alexander; Mohammed Bahja; Gibran Farook Butt

doi:10.2196/29385

Automating Large-scale Health Care Service Feedback Analysis: Sentiment Analysis and Topic Modeling Study

George Alexander, Mohammed Bahja, Gibran Farook Butt

Research output: Contribution to journal › Article › peer-review

120 Downloads (Pure)

Abstract

BACKGROUND: Obtaining patient feedback is an essential mechanism for health care service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experiences. The Department of Health and Social Care in England via National Health Service Digital operates a patient feedback web service through which patients can leave feedback of their experiences in structured and free-text report forms. Free-text feedback, compared with structured questionnaires, may be less biased by the feedback collector and, thus, more representative; however, it is harder to analyze in large quantities and challenging to derive meaningful, quantitative outcomes.

OBJECTIVE: The aim of this study is to build a novel data analysis and interactive visualization pipeline accessible through an interactive web application to facilitate the interrogation of and provide unique insights into National Health Service patient feedback.

METHODS: This study details the development of a text analysis tool that uses contemporary natural language processing and machine learning models to analyze free-text clinical service reviews to develop a robust classification model and interactive visualization web application. The methodology is based on the design science research paradigm and was conducted in three iterations: a sentiment analysis of the patient feedback corpus in the first iteration, topic modeling (unigram and bigram)-based analysis for topic identification in the second iteration, and nested topic modeling in the third iteration that combines sentiment analysis and topic modeling methods. An interactive data visualization web application for use by the general public was then created, presenting the data on a geographic representation of the country, making it easily accessible.

RESULTS: Of the 11,103 possible clinical services that could be reviewed across England, 2030 (18.28%) different services received a combined total of 51,845 reviews between October 1, 2017, and September 30, 2019. Dominant topics were identified for the entire corpus followed by negative- and positive-sentiment topics in turn. Reviews containing high- and low-sentiment topics occurred more frequently than reviews containing less polarized topics. Time-series analysis identified trends in topic and sentiment occurrence frequency across the study period.

CONCLUSIONS: Using contemporary natural language processing techniques, unstructured text data were effectively characterized for further analysis and visualization. An efficient pipeline was successfully combined with a web application, making automated analysis and dissemination of large volumes of information accessible. This study represents a significant step in efforts to generate and visualize useful, actionable, and unique information from free-text patient reviews.

Original language	English
Article number	e29385
Journal	JMIR Medical Informatics
Volume	10
Issue number	4
DOIs	https://doi.org/10.2196/29385
Publication status	Published - 11 Apr 2022

Bibliographical note

©George Alexander, Mohammed Bahja, Gibran Farook Butt. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.04.2022.

Keywords

National Health Service
automated solutions
free-text
large-scale health service
latent Dirichlet allocation
natural language processing
patient feedback
reviews
topic modeling
unstructured data

ASJC Scopus subject areas

Health Informatics
Health Information Management

Access to Document

10.2196/29385Licence: Creative Commons: Attribution (CC BY)

Alexander_et_al_2022_Automating_large-scale_health_care_service_JMIR_Medical_InformaticsFinal published version, 1.55 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{b59a9575d3a04177b92f743f940ca5a3,

title = "Automating Large-scale Health Care Service Feedback Analysis: Sentiment Analysis and Topic Modeling Study",

abstract = "BACKGROUND: Obtaining patient feedback is an essential mechanism for health care service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experiences. The Department of Health and Social Care in England via National Health Service Digital operates a patient feedback web service through which patients can leave feedback of their experiences in structured and free-text report forms. Free-text feedback, compared with structured questionnaires, may be less biased by the feedback collector and, thus, more representative; however, it is harder to analyze in large quantities and challenging to derive meaningful, quantitative outcomes.OBJECTIVE: The aim of this study is to build a novel data analysis and interactive visualization pipeline accessible through an interactive web application to facilitate the interrogation of and provide unique insights into National Health Service patient feedback.METHODS: This study details the development of a text analysis tool that uses contemporary natural language processing and machine learning models to analyze free-text clinical service reviews to develop a robust classification model and interactive visualization web application. The methodology is based on the design science research paradigm and was conducted in three iterations: a sentiment analysis of the patient feedback corpus in the first iteration, topic modeling (unigram and bigram)-based analysis for topic identification in the second iteration, and nested topic modeling in the third iteration that combines sentiment analysis and topic modeling methods. An interactive data visualization web application for use by the general public was then created, presenting the data on a geographic representation of the country, making it easily accessible.RESULTS: Of the 11,103 possible clinical services that could be reviewed across England, 2030 (18.28%) different services received a combined total of 51,845 reviews between October 1, 2017, and September 30, 2019. Dominant topics were identified for the entire corpus followed by negative- and positive-sentiment topics in turn. Reviews containing high- and low-sentiment topics occurred more frequently than reviews containing less polarized topics. Time-series analysis identified trends in topic and sentiment occurrence frequency across the study period.CONCLUSIONS: Using contemporary natural language processing techniques, unstructured text data were effectively characterized for further analysis and visualization. An efficient pipeline was successfully combined with a web application, making automated analysis and dissemination of large volumes of information accessible. This study represents a significant step in efforts to generate and visualize useful, actionable, and unique information from free-text patient reviews.",

keywords = "National Health Service, automated solutions, free-text, large-scale health service, latent Dirichlet allocation, natural language processing, patient feedback, reviews, topic modeling, unstructured data",

author = "George Alexander and Mohammed Bahja and Butt, {Gibran Farook}",

note = "{\textcopyright}George Alexander, Mohammed Bahja, Gibran Farook Butt. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.04.2022.",

year = "2022",

month = apr,

day = "11",

doi = "10.2196/29385",

language = "English",

volume = "10",

journal = "JMIR Medical Informatics",

issn = "2291-9694",

publisher = "JMIR Publications",

number = "4",

}

TY - JOUR

T1 - Automating Large-scale Health Care Service Feedback Analysis

T2 - Sentiment Analysis and Topic Modeling Study

AU - Alexander, George

AU - Bahja, Mohammed

AU - Butt, Gibran Farook

N1 - ©George Alexander, Mohammed Bahja, Gibran Farook Butt. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.04.2022.

PY - 2022/4/11

Y1 - 2022/4/11

N2 - BACKGROUND: Obtaining patient feedback is an essential mechanism for health care service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experiences. The Department of Health and Social Care in England via National Health Service Digital operates a patient feedback web service through which patients can leave feedback of their experiences in structured and free-text report forms. Free-text feedback, compared with structured questionnaires, may be less biased by the feedback collector and, thus, more representative; however, it is harder to analyze in large quantities and challenging to derive meaningful, quantitative outcomes.OBJECTIVE: The aim of this study is to build a novel data analysis and interactive visualization pipeline accessible through an interactive web application to facilitate the interrogation of and provide unique insights into National Health Service patient feedback.METHODS: This study details the development of a text analysis tool that uses contemporary natural language processing and machine learning models to analyze free-text clinical service reviews to develop a robust classification model and interactive visualization web application. The methodology is based on the design science research paradigm and was conducted in three iterations: a sentiment analysis of the patient feedback corpus in the first iteration, topic modeling (unigram and bigram)-based analysis for topic identification in the second iteration, and nested topic modeling in the third iteration that combines sentiment analysis and topic modeling methods. An interactive data visualization web application for use by the general public was then created, presenting the data on a geographic representation of the country, making it easily accessible.RESULTS: Of the 11,103 possible clinical services that could be reviewed across England, 2030 (18.28%) different services received a combined total of 51,845 reviews between October 1, 2017, and September 30, 2019. Dominant topics were identified for the entire corpus followed by negative- and positive-sentiment topics in turn. Reviews containing high- and low-sentiment topics occurred more frequently than reviews containing less polarized topics. Time-series analysis identified trends in topic and sentiment occurrence frequency across the study period.CONCLUSIONS: Using contemporary natural language processing techniques, unstructured text data were effectively characterized for further analysis and visualization. An efficient pipeline was successfully combined with a web application, making automated analysis and dissemination of large volumes of information accessible. This study represents a significant step in efforts to generate and visualize useful, actionable, and unique information from free-text patient reviews.

AB - BACKGROUND: Obtaining patient feedback is an essential mechanism for health care service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experiences. The Department of Health and Social Care in England via National Health Service Digital operates a patient feedback web service through which patients can leave feedback of their experiences in structured and free-text report forms. Free-text feedback, compared with structured questionnaires, may be less biased by the feedback collector and, thus, more representative; however, it is harder to analyze in large quantities and challenging to derive meaningful, quantitative outcomes.OBJECTIVE: The aim of this study is to build a novel data analysis and interactive visualization pipeline accessible through an interactive web application to facilitate the interrogation of and provide unique insights into National Health Service patient feedback.METHODS: This study details the development of a text analysis tool that uses contemporary natural language processing and machine learning models to analyze free-text clinical service reviews to develop a robust classification model and interactive visualization web application. The methodology is based on the design science research paradigm and was conducted in three iterations: a sentiment analysis of the patient feedback corpus in the first iteration, topic modeling (unigram and bigram)-based analysis for topic identification in the second iteration, and nested topic modeling in the third iteration that combines sentiment analysis and topic modeling methods. An interactive data visualization web application for use by the general public was then created, presenting the data on a geographic representation of the country, making it easily accessible.RESULTS: Of the 11,103 possible clinical services that could be reviewed across England, 2030 (18.28%) different services received a combined total of 51,845 reviews between October 1, 2017, and September 30, 2019. Dominant topics were identified for the entire corpus followed by negative- and positive-sentiment topics in turn. Reviews containing high- and low-sentiment topics occurred more frequently than reviews containing less polarized topics. Time-series analysis identified trends in topic and sentiment occurrence frequency across the study period.CONCLUSIONS: Using contemporary natural language processing techniques, unstructured text data were effectively characterized for further analysis and visualization. An efficient pipeline was successfully combined with a web application, making automated analysis and dissemination of large volumes of information accessible. This study represents a significant step in efforts to generate and visualize useful, actionable, and unique information from free-text patient reviews.

KW - National Health Service

KW - automated solutions

KW - free-text

KW - large-scale health service

KW - latent Dirichlet allocation

KW - natural language processing

KW - patient feedback

KW - reviews

KW - topic modeling

KW - unstructured data

UR - http://www.scopus.com/inward/record.url?scp=85128458371&partnerID=8YFLogxK

U2 - 10.2196/29385

DO - 10.2196/29385

M3 - Article

C2 - 35404254

SN - 2291-9694

VL - 10

JO - JMIR Medical Informatics

JF - JMIR Medical Informatics

IS - 4

M1 - e29385

ER -

Automating Large-scale Health Care Service Feedback Analysis: Sentiment Analysis and Topic Modeling Study

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this