Abstract
Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.
Original language | English |
---|---|
Title of host publication | IJCAI International Joint Conference on Artificial Intelligence |
Editors | Qiang Yang, Michael Wooldridge |
Publisher | AAAI Press |
Pages | 4033-4040 |
Number of pages | 8 |
ISBN (Print) | 9781577357384 |
Publication status | Published - 2015 |
Event | 24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina Duration: 25 Jul 2015 → 31 Jul 2015 |
Conference
Conference | 24th International Joint Conference on Artificial Intelligence, IJCAI 2015 |
---|---|
Country/Territory | Argentina |
City | Buenos Aires |
Period | 25/07/15 → 31/07/15 |
ASJC Scopus subject areas
- Artificial Intelligence