基于伪造的强大对抗强化学习

论文标题

基于伪造的强大对抗强化学习

Falsification-Based Robust Adversarial Reinforcement Learning

论文作者

Wang, Xiao, Nair, Saasha, Althoff, Matthias

论文摘要

强化学习（RL）在解决各种顺序决策问题（例如机器人技术中的控制任务）方面取得了巨大进步。由于政策过多地适合培训环境，因此RL方法通常未能推广到关键安全测试方案。强大的对抗RL（RARL）先前被提议训练一个对对抗网络，该网络将干扰应用于系统，从而改善了测试方案的鲁棒性。但是，基于神经网络的对手的一个问题是，难以整合系统需求而不手工制作复杂的奖励信号。安全伪造方法允许人们找到一组初始条件和输入序列，从而使系统违反了以时间逻辑为特定的特性。在本文中，我们提出了基于伪造的RARL（FRARL）：这是在对抗性学习中整合时间逻辑伪造以提高政策鲁棒性的第一个通用框架。通过应用我们的伪造方法，我们无需为对手构建额外的奖励功能。此外，我们在制动辅助系统和自动驾驶汽车的自适应巡航控制系统上评估了我们的方法。我们的实验结果表明，与没有对手或对抗性网络训练的训练的政策，接受基于伪造的对手训练的政策更好地概括了测试场景中对安全规范的侵犯。

Reinforcement learning (RL) has achieved enormous progress in solving various sequential decision-making problems, such as control tasks in robotics. Since policies are overfitted to training environments, RL methods have often failed to be generalized to safety-critical test scenarios. Robust adversarial RL (RARL) was previously proposed to train an adversarial network that applies disturbances to a system, which improves the robustness in test scenarios. However, an issue of neural network-based adversaries is that integrating system requirements without handcrafting sophisticated reward signals are difficult. Safety falsification methods allow one to find a set of initial conditions and an input sequence, such that the system violates a given property formulated in temporal logic. In this paper, we propose falsification-based RARL (FRARL): this is the first generic framework for integrating temporal logic falsification in adversarial learning to improve policy robustness. By applying our falsification method, we do not need to construct an extra reward function for the adversary. Moreover, we evaluate our approach on a braking assistance system and an adaptive cruise control system of autonomous vehicles. Our experimental results demonstrate that policies trained with a falsification-based adversary generalize better and show less violation of the safety specification in test scenarios than those trained without an adversary or with an adversarial network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题