Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

Luke T Slater, Sophie Russell, Silver Makepeace, Alexander Carberry, Andreas Karwath, John A Williams, Hilary Fanning, Simon Ball, Robert Hoehndorf, Georgios V Gkoutos

Research output: Working paper/PreprintPreprint


Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction, by leveraging the wealth of background knowledge provided by biomedical ontologies. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or methods in the area. In this work, we develop a reproducible platform for benchmarking experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from text narrative associated with admissions in MIMIC-III. In doing this, we identify and interpret the performance of a large number of semantic similarity measures for this task, and provide a basis for further research on related tasks in the area.
Original languageEnglish
Number of pages12
Publication statusPublished - 9 Aug 2021


  • health informatics


Dive into the research topics of 'Evaluating semantic similarity methods for comparison of text-derived phenotype profiles'. Together they form a unique fingerprint.

Cite this