Synthetic image generation provides the ability to efficiently produce large quantities of labeled data, which addresses both the data volume requirements of state-of-the-art vision systems and the expense of manually labeling data. However, systems trained on synthetic data typically under-perform systems trained on realistic data due to mismatch between the synthetic and realistic data distributions. Domain Randomization (DR) is a method of broadening a synthetic data distribution to encompass a realistic data distribution and provides better performance when the exact characteristics of the realistic data distribution are not known or cannot be simulated. However, there is no consensus in the literature on the best method of performing DR. We propose a novel method of ranking DR methods by directly measuring the difference between realistic and DR data distributions. This avoids the need to measure task-specific performance and the associated expense of training and evaluation. We compare different methods for measuring distribution differences, including the Wasserstein and Fréchet Inception distances. We also examine the effect of performing this evaluation directly on images and features generated by an image classification backbone. Finally, we show that the ranking generated by our method is reflected in actual task performance.
|Title of host publication||25th International Conference on Pattern Recognition (ICPR 2020)|
|Publication status||Accepted/In press - 10 Dec 2020|
|Event||25th International Conference on Pattern Recognition - Virtual, Milan, Italy|
Duration: 10 Jan 2021 → 15 Jan 2021
|Conference||25th International Conference on Pattern Recognition|
|Abbreviated title||ICPR 2020|
|Period||10/01/21 → 15/01/21|