Discovering Domain Axioms Using Relational Reinforcement Learning and Declarative Programming

Mohan Sridharan, Prashanth Devarakonda, Rashmica Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents an architecture that integrates declarative programming and relational reinforcement learning to support incremental and interactive discovery of previously unknown axioms governing domain dynamics. Specifically, Answer Set Prolog (ASP), a declarative programming paradigm, is used to represent and reason with incomplete commonsense domain knowledge. For any given goal, any unexplained failure of plans created by inference in the ASP program is taken to indicate the existence of unknown domain axioms. The task of discovering these axioms is formulated as a Reinforcement Learning problem, and decisiontree regression with a relational representation is used to incrementally generalize from specific axioms identified over time. These new axioms are added to the ASP program for subsequent inference. We demonstrate and evaluate the capabilities of our architecture in two simulated domains: Blocks World and Simple Mario.
Original languageEnglish
Title of host publicationProceedings of the 4th Workshop on Planning and Robotics (PlanRob)
Subtitle of host publicationat the 26th International Conference on Automated Planning and Scheduling (ICAPS 2016)
EditorsAlberto Finzi, Erez Karpas
Pages204-212
Publication statusPublished - 13 Jun 2016
Event4th Workshop on Planning and Robotics (PlanRob) at the 26th International Conference on Automated Planning and Scheduling (ICAPS 2016) - London, United Kingdom
Duration: 13 Jun 201614 Jun 2016

Conference

Conference4th Workshop on Planning and Robotics (PlanRob) at the 26th International Conference on Automated Planning and Scheduling (ICAPS 2016)
Country/TerritoryUnited Kingdom
CityLondon
Period13/06/1614/06/16

Fingerprint

Dive into the research topics of 'Discovering Domain Axioms Using Relational Reinforcement Learning and Declarative Programming'. Together they form a unique fingerprint.

Cite this