基于可及性的轨迹保障（RTS）：一个安全，快速的增强学习安全层，用于连续控制

论文标题

基于可及性的轨迹保障（RTS）：一个安全，快速的增强学习安全层，用于连续控制

Reachability-based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

论文作者

Shao, Yifei Simon, Chen, Chao, Kousik, Shreyas, Vasudevan, Ram

论文摘要

强化学习（RL）算法在决策和控制任务中取得了出色的性能，因为他们使用试用和错误就长期，累积奖励进行推理的能力。但是，在RL培训期间，将这种试用方法应用于在安全关键环境中运行的现实世界机器人可能会导致碰撞。为了应对这一挑战，本文提出了一个基于可及性的轨迹保障（RTS），该保护措施利用可及性分析以确保在培训和操作期间的安全性。在机器人的已知（但不确定）模型的情况下，RTS预先计算了一个可符合参数化轨迹连续体的机器人的正向触手可及的集合。在运行时，RL代理以回收的方式从该连续体中选择控制机器人。 FRS用于识别代理商的选择是否安全，并调整不安全的选择。在三个非线性机器人模型（包括12-D四极管无人机）中说明了该方法的功效，并与最先进的安全运动计划方法相比。

Reinforcement Learning (RL) algorithms have achieved remarkable performance in decision making and control tasks due to their ability to reason about long-term, cumulative reward using trial and error. However, during RL training, applying this trial-and-error approach to real-world robots operating in safety critical environment may lead to collisions. To address this challenge, this paper proposes a Reachability-based Trajectory Safeguard (RTS), which leverages reachability analysis to ensure safety during training and operation. Given a known (but uncertain) model of a robot, RTS precomputes a Forward Reachable Set of the robot tracking a continuum of parameterized trajectories. At runtime, the RL agent selects from this continuum in a receding-horizon way to control the robot; the FRS is used to identify if the agent's choice is safe or not, and to adjust unsafe choices. The efficacy of this method is illustrated on three nonlinear robot models, including a 12-D quadrotor drone, in simulation and in comparison with state-of-the-art safe motion planning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题