Abstract
A central challenge facing the nature human-computer interaction involves understanding how neural circuits process visual perceptual information to improve the user’s operation ability under complex tasks. Visual coding models aim to explore the biological characteristics of retinal ganglion cells to provide quantitative predictions of responses to a range of visual stimuli. The existing visual coding models lack adaptability in natural and complex scenes. Therefore this paper proposes an enhanced visual coding model through collaborative perception. Our model first extracts the multi-modal spatiotemporal features of the input video to simulate the retinal response characteristics adaptively. Secondly, it uses the basis function to compile the input stimulus into a multi-modal stimulus matrix. Afterward, the upstream and downstream filters reform the stimulus matrix to generate the spike sequence. Experiments show that the proposed model reproduces the physiological characteristics of ganglion cells in the biological retina, leading to the high accuracy, good adaptability, and biological interpretability in comparison with its rivals.
Original language | English |
---|---|
Journal | IEEE Transactions on Cognitive and Developmental Systems |
Early online date | 1 Sept 2022 |
DOIs | |
Publication status | E-pub ahead of print - 1 Sept 2022 |
Bibliographical note
Funding:This work was supported by the National Natural Science Foundation of China (Grant Nos. 62072355 and 62072354),the Key Research and Development Program of Shaanxi Province of China (Grant No. 2022KWZ-10) and the Natural Science Foundation of Guangdong Province of China (Grant No. 2022A1515011424).
Keywords
- Visual Coding
- Multi-modal Stimulus
- Feature Compilation
- Nonlinearity