TY - JOUR
T1 - A framework for evaluating the performance of SMLM data clustering algorithms
AU - Nieves, Daniel
AU - Pike, Jeremy
AU - Levet, Florian
AU - Williamson, David J.
AU - Baragilly, Mohammed
AU - Oloketuyi, Sandra
AU - de Marco, Ario
AU - Griffié, Juliette
AU - Sage, Daniel
AU - Cohen, Edward A.K.
AU - Sibarita, Jean-Baptiste
AU - Heilemann, Mike
AU - Owen, Dylan
N1 - D.M.O. acknowledges funding from BBSRC grant BB/R007365/1. M.H. acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, Project-ID 259130777, SFB 1177; GRK 2566). D.M.O. and M.B. acknowledge funding from the Alan Turing Institute.
PY - 2023/2/10
Y1 - 2023/2/10
N2 - Single-molecule localization microscopy (SMLM) generates data in the form of coordinates of localized fluorophores. Cluster analysis is an attractive route for extracting biologically meaningful information from such data and has been widely applied. Despite a range of cluster analysis algorithms, there exists no consensus framework for the evaluation of their performance. Here, we use a systematic approach based on two metrics to score the success of clustering algorithms in simulated conditions mimicking experimental data. We demonstrate the framework using seven diverse analysis algorithms: DBSCAN, ToMATo, KDE, FOCAL, CAML, ClusterViSu and SR-Tesseler. Given that the best performer depended on the underlying distribution of localizations, we demonstrate an analysis pipeline based on statistical similarity measures that enables the selection of the most appropriate algorithm, and the optimized analysis parameters for real SMLM data. We propose that these standard simulated conditions, metrics and analysis pipeline become the basis for future analysis algorithm development and evaluation.
AB - Single-molecule localization microscopy (SMLM) generates data in the form of coordinates of localized fluorophores. Cluster analysis is an attractive route for extracting biologically meaningful information from such data and has been widely applied. Despite a range of cluster analysis algorithms, there exists no consensus framework for the evaluation of their performance. Here, we use a systematic approach based on two metrics to score the success of clustering algorithms in simulated conditions mimicking experimental data. We demonstrate the framework using seven diverse analysis algorithms: DBSCAN, ToMATo, KDE, FOCAL, CAML, ClusterViSu and SR-Tesseler. Given that the best performer depended on the underlying distribution of localizations, we demonstrate an analysis pipeline based on statistical similarity measures that enables the selection of the most appropriate algorithm, and the optimized analysis parameters for real SMLM data. We propose that these standard simulated conditions, metrics and analysis pipeline become the basis for future analysis algorithm development and evaluation.
U2 - 10.1038/s41592-022-01750-6
DO - 10.1038/s41592-022-01750-6
M3 - Article
SN - 1548-7091
VL - 20
SP - 259
EP - 267
JO - Nature Methods
JF - Nature Methods
ER -