Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.
Bibliographical noteFunding Information:
D.H.’s work was supported by a grant from the UK-ESRC ES/T002409/1.
© 2021 The Authors
- Convolutional neural networks
- Shape processing
- Visual processing
ASJC Scopus subject areas
- Sensory Systems