Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

Luke T Slater; Sophie Russell; Silver Makepeace; Alexander Carberry; Andreas Karwath; John A Williams; Hilary Fanning; Simon Ball; Robert Hoehndorf; Georgios V Gkoutos

doi:10.1101/2021.08.08.21261762

Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

Luke T Slater, Sophie Russell, Silver Makepeace, Alexander Carberry, Andreas Karwath, John A Williams, Hilary Fanning, Simon Ball, Robert Hoehndorf, Georgios V Gkoutos

Research output: Working paper/Preprint › Preprint

Abstract

Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction, by leveraging the wealth of background knowledge provided by biomedical ontologies. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or methods in the area. In this work, we develop a reproducible platform for benchmarking experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from text narrative associated with admissions in MIMIC-III. In doing this, we identify and interpret the performance of a large number of semantic similarity measures for this task, and provide a basis for further research on related tasks in the area.

Original language	English
Publisher	medRxiv
Number of pages	12
DOIs	https://doi.org/10.1101/2021.08.08.21261762
Publication status	Published - 9 Aug 2021

Keywords

health informatics

Access to Document

10.1101/2021.08.08.21261762Licence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@techreport{956d3d9dcf404ff19fd145b572ad35ee,

title = "Evaluating semantic similarity methods for comparison of text-derived phenotype profiles",

abstract = "Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance {\textquoteleft}patient-like me{\textquoteright} analyses, automated coding, differential diagnosis, and outcome prediction, by leveraging the wealth of background knowledge provided by biomedical ontologies. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or methods in the area. In this work, we develop a reproducible platform for benchmarking experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from text narrative associated with admissions in MIMIC-III. In doing this, we identify and interpret the performance of a large number of semantic similarity measures for this task, and provide a basis for further research on related tasks in the area.",

keywords = "health informatics",

author = "Slater, {Luke T} and Sophie Russell and Silver Makepeace and Alexander Carberry and Andreas Karwath and Williams, {John A} and Hilary Fanning and Simon Ball and Robert Hoehndorf and Gkoutos, {Georgios V}",

year = "2021",

month = aug,

day = "9",

doi = "10.1101/2021.08.08.21261762",

language = "English",

publisher = "medRxiv",

type = "WorkingPaper",

institution = "medRxiv",

}

TY - UNPB

T1 - Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

AU - Slater, Luke T

AU - Russell, Sophie

AU - Makepeace, Silver

AU - Carberry, Alexander

AU - Karwath, Andreas

AU - Williams, John A

AU - Fanning, Hilary

AU - Ball, Simon

AU - Hoehndorf, Robert

AU - Gkoutos, Georgios V

PY - 2021/8/9

Y1 - 2021/8/9

N2 - Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction, by leveraging the wealth of background knowledge provided by biomedical ontologies. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or methods in the area. In this work, we develop a reproducible platform for benchmarking experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from text narrative associated with admissions in MIMIC-III. In doing this, we identify and interpret the performance of a large number of semantic similarity measures for this task, and provide a basis for further research on related tasks in the area.

AB - Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction, by leveraging the wealth of background knowledge provided by biomedical ontologies. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or methods in the area. In this work, we develop a reproducible platform for benchmarking experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from text narrative associated with admissions in MIMIC-III. In doing this, we identify and interpret the performance of a large number of semantic similarity measures for this task, and provide a basis for further research on related tasks in the area.

KW - health informatics

UR - https://doi.org/10.1101/2021.08.08.21261762

U2 - 10.1101/2021.08.08.21261762

DO - 10.1101/2021.08.08.21261762

M3 - Preprint

BT - Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

PB - medRxiv

ER -

Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

Abstract

Keywords

Access to Document

Fingerprint

Cite this