安全状态增强对安全勘探的影响

论文标题

安全状态增强对安全勘探的影响

Effects of Safety State Augmentation on Safe Exploration

论文作者

Sootla, Aivar, Cowen-Rivers, Alexander I., Wang, Jun, Ammar, Haitham Bou

论文摘要

安全探索是无模型增强学习（RL）的一个具有挑战性且重要的问题。通常，安全成本稀疏且未知，这不可避免地会导致违规限制 - 理想情况下，在安全至关重要的应用中避免了这种现象。我们通过使用安全状态来增加状态空间来解决这个问题，当且仅当满足约束时，这是无负的。该状态的价值也是违反约束的距离，而其初始值表示可用的安全预算。这个想法使我们能够得出在培训期间安排安全预算的政策。我们称我们的方法immer（RL的安全政策改进）来反映这些时间表的仔细性质。我们将此想法应用于两个安全的RL问题：RL对平均成本施加的约束，RL对成本施加的限制有概率。我们的实验表明：“沸腾的算法可以在两种设置的训练过程中提高安全性。我们进一步表明，慢炖可以稳定训练并通过平均约束来提高安全RL的性能。

Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that "simmering, a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题