Increasingly cautious optimism for practical PAC-MDP exploration

Liangpeng Zhang; Ke Tang; Xin Yao

Increasingly cautious optimism for practical PAC-MDP exploration

Liangpeng Zhang, Ke Tang^*, Xin Yao

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Citations (Scopus)

Abstract

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.

Original language	English
Title of host publication	IJCAI International Joint Conference on Artificial Intelligence
Editors	Qiang Yang, Michael Wooldridge
Publisher	AAAI Press
Pages	4033-4040
Number of pages	8
ISBN (Print)	9781577357384
Publication status	Published - 2015
Event	24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina Duration: 25 Jul 2015 → 31 Jul 2015

Conference

Conference	24th International Joint Conference on Artificial Intelligence, IJCAI 2015
Country/Territory	Argentina
City	Buenos Aires
Period	25/07/15 → 31/07/15

ASJC Scopus subject areas

Artificial Intelligence

Access to Document

http://ijcai.org/papers15/Papers/IJCAI15-566.pdf

Cite this

@inproceedings{598c404ce33347ab91f82e98b0c73844,

title = "Increasingly cautious optimism for practical PAC-MDP exploration",

abstract = "Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.",

author = "Liangpeng Zhang and Ke Tang and Xin Yao",

year = "2015",

language = "English",

isbn = "9781577357384",

pages = "4033--4040",

editor = "Qiang Yang and Michael Wooldridge",

booktitle = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "AAAI Press",

note = "24th International Joint Conference on Artificial Intelligence, IJCAI 2015 ; Conference date: 25-07-2015 Through 31-07-2015",

}

Zhang, L, Tang, K & Yao, X 2015, Increasingly cautious optimism for practical PAC-MDP exploration. in Q Yang & M Wooldridge (eds), IJCAI International Joint Conference on Artificial Intelligence. AAAI Press, pp. 4033-4040, 24th International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, 25/07/15. <http://ijcai.org/papers15/Papers/IJCAI15-566.pdf>

TY - GEN

T1 - Increasingly cautious optimism for practical PAC-MDP exploration

AU - Zhang, Liangpeng

AU - Tang, Ke

AU - Yao, Xin

PY - 2015

Y1 - 2015

N2 - Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.

AB - Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.

UR - http://www.scopus.com/inward/record.url?scp=84949790685&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84949790685

SN - 9781577357384

SP - 4033

EP - 4040

BT - IJCAI International Joint Conference on Artificial Intelligence

A2 - Yang, Qiang

A2 - Wooldridge, Michael

PB - AAAI Press

T2 - 24th International Joint Conference on Artificial Intelligence, IJCAI 2015

Y2 - 25 July 2015 through 31 July 2015

ER -

Increasingly cautious optimism for practical PAC-MDP exploration

Abstract

Conference

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this