论文标题
安全的加强学习和概率保证可满足连续动作空间中的时间逻辑规格
Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces
论文作者
论文摘要
香草增强学习(RL)可以有效地解决复杂的任务,但不能为系统行为提供任何保证。为了弥合这一差距,我们为连续的动作空间提出了一个三步安全的RL程序,该过程提供了有关时间逻辑规格的概率保证。首先,我们的方法概率地验证了候选控制器相对于时间逻辑规范的候选控制器,同时将控件输入随机输入到有限集中。其次,我们通过添加一个RL代理来优化经过验证的控制器,以在控制输入周围相同的有限设置中优化经过验证的控制器,从而提高了此概率验证的控制器的性能。第三,我们验证概率安全性在学会代理的时间逻辑规格方面可保证。我们的方法可以有效地实现,以进行连续行动和状态空间。安全验证和绩效改进分为两个不同的步骤,既实现了明确的概率安全保证,又实现了一个直接的RL设置,重点是性能。我们评估了我们在逃避任务上的方法,在逃避任务中,机器人必须达到目标,同时逃避特定的动作动态障碍。我们的结果表明,我们的安全RL方法会导致有效学习,同时保持其概率安全规范。
Vanilla Reinforcement Learning (RL) can efficiently solve complex tasks but does not provide any guarantees on system behavior. To bridge this gap, we propose a three-step safe RL procedure for continuous action spaces that provides probabilistic guarantees with respect to temporal logic specifications. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification while randomizing the control inputs to the system within a bounded set. Second, we improve the performance of this probabilistically verified controller by adding an RL agent that optimizes the verified controller for performance in the same bounded set around the control input. Third, we verify probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficiently implementable for continuous action and state spaces. The separation of safety verification and performance improvement into two distinct steps realizes both explicit probabilistic safety guarantees and a straightforward RL setup that focuses on performance. We evaluate our approach on an evasion task where a robot has to reach a goal while evading a dynamic obstacle with a specific maneuver. Our results show that our safe RL approach leads to efficient learning while maintaining its probabilistic safety specification.