A failure to learn object shape geometry: implications for convolutional neural networks as plausible models of biological vision

Dietmar Heinke; Peter Wachman; Wieske van Zoest; E. Charles Leek

doi:10.1016/j.visres.2021.09.004

A failure to learn object shape geometry: implications for convolutional neural networks as plausible models of biological vision

Dietmar Heinke^*, Peter Wachman, Wieske van Zoest, E. Charles Leek

^*Corresponding author for this work

Psychology

Research output: Contribution to journal › Article › peer-review

59 Downloads (Pure)

Abstract

Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.

Original language	English
Pages (from-to)	81-92
Number of pages	12
Journal	Vision Research
Volume	189
Early online date	8 Oct 2021
DOIs	https://doi.org/10.1016/j.visres.2021.09.004
Publication status	Published - Dec 2021

Bibliographical note

Funding Information:
D.H.’s work was supported by a grant from the UK-ESRC ES/T002409/1.

Publisher Copyright:
© 2021 The Authors

Keywords

Convolutional neural networks
Shape processing
Visual processing

ASJC Scopus subject areas

Ophthalmology
Sensory Systems

Access to Document

10.1016/j.visres.2021.09.004Licence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

HeinkeD2021failureFinal published version, 10.3 MBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@article{26e40e352c304e2ca53eb0f90a7c3c97,

title = "A failure to learn object shape geometry: implications for convolutional neural networks as plausible models of biological vision",

abstract = "Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.",

keywords = "Convolutional neural networks, Shape processing, Visual processing",

author = "Dietmar Heinke and Peter Wachman and {van Zoest}, Wieske and Leek, {E. Charles}",

note = "Funding Information: D.H.{\textquoteright}s work was supported by a grant from the UK-ESRC ES/T002409/1. Publisher Copyright: {\textcopyright} 2021 The Authors",

year = "2021",

month = dec,

doi = "10.1016/j.visres.2021.09.004",

language = "English",

volume = "189",

pages = "81--92",

journal = "Vision Research",

issn = "0042-6989",

publisher = "Elsevier",

}

TY - JOUR

T1 - A failure to learn object shape geometry

T2 - implications for convolutional neural networks as plausible models of biological vision

AU - Heinke, Dietmar

AU - Wachman, Peter

AU - van Zoest, Wieske

AU - Leek, E. Charles

PY - 2021/12

Y1 - 2021/12

N2 - Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.

AB - Here we examine the plausibility of deep convolutional neural networks (CNNs) as a theoretical framework for understanding biological vision in the context of image classification. Recent work on object recognition in human vision has shown that both global, and local, shape information is computed, and integrated, early during perceptual processing. Our goal was to compare the similarity in how object shape information is processed by CNNs and human observers. We tested the hypothesis that, unlike the human system, CNNs do not compute representations of global and local object geometry during image classification. To do so, we trained and tested six CNNs (AlexNet, VGG-11, VGG-16, ResNet-18, ResNet-50, GoogLeNet), and human observers, to discriminate geometrically possible and impossible objects. The ability to complete this task requires computation of a representational structure of shape that encodes both global and local object geometry because the detection of impossibility derives from an incongruity between well-formed local feature conjunctions and their integration into a geometrically well-formed 3D global shape. Unlike human observers, none of the tested CNNs could reliably discriminate between possible and impossible objects. Detailed analyses using gradient-weighted class activation mapping (GradCam) of CNN image feature processing showed that network classification performance was not constrained by object geometry. In contrast, if classification could be made based solely on local feature information in line drawings the CNNs were highly accurate. We argue that these findings reflect fundamental differences between CNNs and human vision in terms of underlying image processing structure. Notably, unlike human vision, CNNs do not compute representations of object geometry. The results challenge the plausibility of CNNs as a framework for understanding image classification in biological vision systems.

KW - Convolutional neural networks

KW - Shape processing

KW - Visual processing

UR - http://www.scopus.com/inward/record.url?scp=85116613044&partnerID=8YFLogxK

U2 - 10.1016/j.visres.2021.09.004

DO - 10.1016/j.visres.2021.09.004

M3 - Article

C2 - 34634753

AN - SCOPUS:85116613044

SN - 0042-6989

VL - 189

SP - 81

EP - 92

JO - Vision Research

JF - Vision Research

ER -

A failure to learn object shape geometry: implications for convolutional neural networks as plausible models of biological vision

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this