A failure to learn object shape geometry: implications for convolutional neural networks as plausible models of biological vision

Dietmar Heinke*, Peter Wachman, Wieske van Zoest, E. Charles Leek

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

128 Downloads (Pure)

Abstract

Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.

Original languageEnglish
Pages (from-to)81-92
Number of pages12
JournalVision Research
Volume189
Early online date8 Oct 2021
DOIs
Publication statusPublished - Dec 2021

Bibliographical note

Funding Information:
D.H.’s work was supported by a grant from the UK-ESRC ES/T002409/1.

Publisher Copyright:
© 2021 The Authors

Keywords

  • Convolutional neural networks
  • Shape processing
  • Visual processing

ASJC Scopus subject areas

  • Ophthalmology
  • Sensory Systems

Fingerprint

Dive into the research topics of 'A failure to learn object shape geometry: implications for convolutional neural networks as plausible models of biological vision'. Together they form a unique fingerprint.

Cite this