Biases with the Generalized Euclidean Distance measure in disparity analyses with high levels of missing data
Research output: Contribution to journal › Article › peer-review
Colleges, School and Institutes
- University of Leeds
The Generalized Euclidean Distance (GED) measure has been extensively used to conduct morphological disparity analyses based on palaeontological matrices of discrete characters. This is in part because some implementations allow the use of morphological matrices with high percentages of missing data without needing to prune taxa for a subsequent ordination of the data set. Previous studies have suggested that this way of using the GED may generate a bias in the resulting morphospace, but a detailed study of this possible effect has been lacking. Here, we test whether the percentage of missing data for a taxon artificially influences its position in the morphospace, and if missing data affects pre- and post-ordination disparity measures. We find that this use of the GED creates a systematic bias, whereby taxa with higher percentages of missing data are placed closer to the centre of the morphospace than those with more complete scorings. This bias extends into pre- and post-ordination calculations of disparity measures and can lead to erroneous interpretations of disparity patterns, especially if specimens present in a particular time interval or clade have distinct proportions of missing information. We suggest that this implementation of the GED should be used with caution, especially in cases with high percentages of missing data. Results recovered using an alternative distance measure, Maximum Observed Rescaled Distance (MORD), are more robust to missing data. As a consequence, we suggest that MORD is a more appropriate distance measure than GED when analysing data sets with high amounts of missing data.
|Number of pages||13|
|Early online date||17 May 2019|
|Publication status||Published - 29 Aug 2019|