Sablas：黑盒动力学系统的学习安全控制

论文标题

Sablas：黑盒动力学系统的学习安全控制

SABLAS: Learning Safe Control for Black-box Dynamical Systems

论文作者

Qin, Zengyi, Sun, Dawei, Fan, Chuchu

论文摘要

基于障碍功能的控制证书已成为为动态系统生成安全控制策略的强大工具。但是，基于屏障证书的现有方法通常对于具有可区分动力学的白色框系统来说，这使得它们不适用于系统是黑色框并且无法准确建模的许多实用应用程序。另一方面，黑盒系统的无模型增强学习方法（RL）方法缺乏安全保证和较低的采样效率。在本文中，我们提出了一种新颖的方法，该方法可以学习黑盒动力系统的安全控制策略和屏障证书，而无需精确的系统模型。我们的方法重新设计了损耗函数，即使黑盒动力系统不可差异化，也可以将梯度梯度重新置于控制策略，我们表明安全证书在黑盒系统上保留。模拟中的经验结果表明，与最先进的Black-Box安全控制方法相比，使用较少的培训样品实现了近100％的安全性和目标达到率，我们的方法可以显着提高学习政策的性能。我们博学的代理商还可以推广到看不见的场景，同时保持原始性能。可以在https://github.com/zengyi-qin/bcbf上找到源代码。

Control certificates based on barrier functions have been a powerful tool to generate probably safe control policies for dynamical systems. However, existing methods based on barrier certificates are normally for white-box systems with differentiable dynamics, which makes them inapplicable to many practical applications where the system is a black-box and cannot be accurately modeled. On the other side, model-free reinforcement learning (RL) methods for black-box systems suffer from lack of safety guarantees and low sampling efficiency. In this paper, we propose a novel method that can learn safe control policies and barrier certificates for black-box dynamical systems, without requiring for an accurate system model. Our method re-designs the loss function to back-propagate gradient to the control policy even when the black-box dynamical system is non-differentiable, and we show that the safety certificates hold on the black-box system. Empirical results in simulation show that our method can significantly improve the performance of the learned policies by achieving nearly 100% safety and goal reaching rates using much fewer training samples, compared to state-of-the-art black-box safe control methods. Our learned agents can also generalize to unseen scenarios while keeping the original performance. The source code can be found at https://github.com/Zengyi-Qin/bcbf.

下载PDF全文

下载文献需遵守相关版权规定

论文标题