Increasingly cautious optimism for practical PAC-MDP exploration

Liangpeng Zhang, Ke Tang*, Xin Yao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.

Original languageEnglish
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
EditorsQiang Yang, Michael Wooldridge
PublisherAAAI Press
Pages4033-4040
Number of pages8
ISBN (Print)9781577357384
Publication statusPublished - 2015
Event24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina
Duration: 25 Jul 201531 Jul 2015

Conference

Conference24th International Joint Conference on Artificial Intelligence, IJCAI 2015
Country/TerritoryArgentina
CityBuenos Aires
Period25/07/1531/07/15

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Increasingly cautious optimism for practical PAC-MDP exploration'. Together they form a unique fingerprint.

Cite this