Improved characterisation of clinical text through ontology-based vocabulary expansion

Luke T Slater; William Bradlow; Simon Ball; Robert Hoehndorf; Georgios V Gkoutos

doi:10.1186/s13326-021-00241-5

Improved characterisation of clinical text through ontology-based vocabulary expansion

Luke T Slater, William Bradlow, Simon Ball, Robert Hoehndorf, Georgios V Gkoutos

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

147 Downloads (Pure)

Abstract

BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.

RESULTS: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.

CONCLUSIONS: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.

Original language	English
Article number	7
Number of pages	9
Journal	Journal of Biomedical Semantics
Volume	12
Issue number	1
DOIs	https://doi.org/10.1186/s13326-021-00241-5
Publication status	Published - 12 Apr 2021

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1186/s13326-021-00241-5Licence: Creative Commons: Attribution (CC BY)

SlaterLT2021ImprovedFinal published version, 580 KBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{04eb841564454a0a80e0a8043c0894c3,

title = "Improved characterisation of clinical text through ontology-based vocabulary expansion",

abstract = "BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.RESULTS: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.CONCLUSIONS: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.",

author = "Slater, {Luke T} and William Bradlow and Simon Ball and Robert Hoehndorf and Gkoutos, {Georgios V}",

year = "2021",

month = apr,

day = "12",

doi = "10.1186/s13326-021-00241-5",

language = "English",

volume = "12",

journal = "Journal of Biomedical Semantics",

issn = "2041-1480",

publisher = "BioMed Central Ltd",

number = "1",

}

TY - JOUR

T1 - Improved characterisation of clinical text through ontology-based vocabulary expansion

AU - Slater, Luke T

AU - Bradlow, William

AU - Ball, Simon

AU - Hoehndorf, Robert

AU - Gkoutos, Georgios V

PY - 2021/4/12

Y1 - 2021/4/12

N2 - BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.RESULTS: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.CONCLUSIONS: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.

AB - BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.RESULTS: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.CONCLUSIONS: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.

UR - http://www.scopus.com/inward/record.url?scp=85104367139&partnerID=8YFLogxK

U2 - 10.1186/s13326-021-00241-5

DO - 10.1186/s13326-021-00241-5

M3 - Article

C2 - 33845909

SN - 2041-1480

VL - 12

JO - Journal of Biomedical Semantics

JF - Journal of Biomedical Semantics

IS - 1

M1 - 7

ER -

Improved characterisation of clinical text through ontology-based vocabulary expansion

Abstract

UN SDGs

Access to Document

Fingerprint

Cite this