Probabilistic Guarantees for Safe Deep Reinforcement Learning

Edoardo Bacci; David Parker

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning controllers in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller’s execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on controllers trained for several benchmark control problems.

Original language	English
Title of host publication	Proceedings of 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020)
Editors	Nathalie Bertrand, Nils Jansen
Publisher	Springer
Number of pages	18
Publication status	Accepted/In press - 29 Jun 2020
Event	18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020) - Virtual Event Duration: 1 Sept 2020 → 3 Sept 2020

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020)
City	Virtual Event
Period	1/09/20 → 3/09/20

Cite this

@inproceedings{f11c67e49261439691f6442b9f90c0b1,

title = "Probabilistic Guarantees for Safe Deep Reinforcement Learning",

abstract = "Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning controllers in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller{\textquoteright}s execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on controllers trained for several benchmark control problems.",

author = "Edoardo Bacci and David Parker",

year = "2020",

month = jun,

day = "29",

language = "English",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

editor = "Nathalie Bertrand and Nils Jansen",

booktitle = "Proceedings of 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020)",

note = "18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020) ; Conference date: 01-09-2020 Through 03-09-2020",

}

Bacci, E & Parker, D 2020, Probabilistic Guarantees for Safe Deep Reinforcement Learning. in N Bertrand & N Jansen (eds), Proceedings of 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020). Lecture Notes in Computer Science, Springer, 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020), Virtual Event, 1/09/20.

Probabilistic Guarantees for Safe Deep Reinforcement Learning. / Bacci, Edoardo; Parker, David.
Proceedings of 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020). ed. / Nathalie Bertrand; Nils Jansen. Springer, 2020. (Lecture Notes in Computer Science).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Probabilistic Guarantees for Safe Deep Reinforcement Learning

AU - Bacci, Edoardo

AU - Parker, David

PY - 2020/6/29

Y1 - 2020/6/29

N2 - Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning controllers in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller’s execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on controllers trained for several benchmark control problems.

AB - Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning controllers in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller’s execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on controllers trained for several benchmark control problems.

M3 - Conference contribution

T3 - Lecture Notes in Computer Science

BT - Proceedings of 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020)

A2 - Bertrand, Nathalie

A2 - Jansen, Nils

PB - Springer

T2 - 18th International Conference on Formal Modelling and Analysis of Timed Systems (FORMATS 2020)

Y2 - 1 September 2020 through 3 September 2020

ER -

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Abstract

Publication series

Conference

Fingerprint

Cite this