An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

Ali Aflakian; Alireza Rastegarpanah; Jamie Hathaway; Rustam Stolkin

doi:10.1002/rob.22355

An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

Ali Aflakian, Alireza Rastegarpanah^*, Jamie Hathaway, Rustam Stolkin

^*Corresponding author for this work

Metallurgy and Materials

Research output: Contribution to journal › Article › peer-review

16 Downloads (Pure)

Abstract

This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain‐specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7‐DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image‐based VS, position‐based VS, and hybrid‐decoupled VS.

Original language	English
Number of pages	15
Journal	Journal of Field Robotics
Early online date	28 Apr 2024
DOIs	https://doi.org/10.1002/rob.22355
Publication status	E-pub ahead of print - 28 Apr 2024

Keywords

online learning
multi‐expert demonstrations
imitation learning
reinforcement learning
optimization technique

Access to Document

10.1002/rob.22355Licence: Creative Commons: Attribution (CC BY)

AflakianA2024onlineFinal published version, 2.53 MBLicence: Creative Commons: Attribution (CC BY)

1 Finished
2 Active

REBELION - Research and development of a highly automated and safe streamlined process for increased Lithium-ion battery repurposing and recycling
Anderson, P., Rastegarpanah, A. & Stolkin, R.
UKRI Horizon Europe Underwriting â€“ Innovate UK
1/06/23 → 30/11/26
Project: Research
Recycling of Li-ion Batteries 3
Leeke, G., Slater, P., Kendrick, E., Reed, D., Allan, P., Stolkin, R. & Anderson, P.
The Faraday Institution
1/04/23 → 31/03/25
Project: Research
ReLIB - Faraday Challenge Fast Track proposal - Circular economy
Elliott, R., Lee, R., Allan, P., Slater, P., Stolkin, R., Walton, A., Overton, T., Reed, D., Anderson, P., Windridge, D. & Gough, R.
Engineering & Physical Science Research Council
1/03/18 → 30/06/21
Project: Research Councils

Cite this

@article{304876ec442d43b2b76cc2d35ca1e4b8,

title = "An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers",

abstract = "This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain‐specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7‐DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image‐based VS, position‐based VS, and hybrid‐decoupled VS.",

keywords = "online learning, multi‐expert demonstrations, imitation learning, reinforcement learning, optimization technique",

author = "Ali Aflakian and Alireza Rastegarpanah and Jamie Hathaway and Rustam Stolkin",

year = "2024",

month = apr,

day = "28",

doi = "10.1002/rob.22355",

language = "English",

journal = "Journal of Field Robotics",

issn = "1556-4959",

}

TY - JOUR

T1 - An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

AU - Aflakian, Ali

AU - Rastegarpanah, Alireza

AU - Hathaway, Jamie

AU - Stolkin, Rustam

PY - 2024/4/28

Y1 - 2024/4/28

N2 - This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain‐specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7‐DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image‐based VS, position‐based VS, and hybrid‐decoupled VS.

AB - This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain‐specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7‐DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image‐based VS, position‐based VS, and hybrid‐decoupled VS.

KW - online learning

KW - multi‐expert demonstrations

KW - imitation learning

KW - reinforcement learning

KW - optimization technique

U2 - 10.1002/rob.22355

DO - 10.1002/rob.22355

M3 - Article

SN - 1556-4959

JO - Journal of Field Robotics

JF - Journal of Field Robotics

ER -

An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

Abstract

Keywords

Access to Document

Fingerprint

Projects

REBELION - Research and development of a highly automated and safe streamlined process for increased Lithium-ion battery repurposing and recycling

Recycling of Li-ion Batteries 3

ReLIB - Faraday Challenge Fast Track proposal - Circular economy

Cite this