Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks

Zhile Yang, Shangqi Guo*, Ying Fang, Jian K. Liu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One stream of reinforcement learning research is exploring biologically plausible models and algorithms to simulate biological intelligence and fit neuromorphic hardware. Among them, reward-modulated spike-timing-dependent plasticity (R-STDP) is a recent branch with good potential in energy efficiency. However, current R-STDP methods rely on heuristic designs of local learning rules, thus requiring task-specific expert knowledge. In this paper, we consider a spiking recurrent winner-take-all network, and propose a new R-STDP method, spiking variational policy gradient (SVPG), whose local learning rules are derived from the global policy gradient and thus eliminate the need for heuristic designs. In experiments of MNIST classification and Gym InvertedPendulum, our SVPG achieves good training performance, and also presents better robustness to various kinds of noises than conventional methods.
Original languageEnglish
Title of host publication33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022
PublisherBritish Machine Vision Association
Number of pages13
Publication statusPublished - 25 Nov 2022
EventThe 33rd British Machine Vision Conference - The Kia Oval, London, United Kingdom
Duration: 21 Nov 202224 Nov 2022
https://bmvc2022.org/

Conference

ConferenceThe 33rd British Machine Vision Conference
Abbreviated titleBMVC
Country/TerritoryUnited Kingdom
CityLondon
Period21/11/2224/11/22
Internet address

Fingerprint

Dive into the research topics of 'Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks'. Together they form a unique fingerprint.

Cite this