论文标题

挡板:隐藏在离线增强学习数据集中的后门

BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets

论文作者

Gong, Chen, Yang, Zhou, Bai, Yunpeng, He, Junda, Shi, Jieke, Li, Kecen, Sinha, Arunesh, Xu, Bowen, Hou, Xinwen, Lo, David, Wang, Tianhao

论文摘要

强化学习(RL)使经纪人从与环境互动期间收集的试验经验中学习。最近,离线RL已成为流行的RL范式,因为它可以节省与环境的交互。在离线RL中,数据提供商共享大型预收取的数据集,而其他人可以在不与环境互动的情况下训练高质量的代理。该范式在机器人控制,自动驾驶等关键任务中表现出了有效性。但是,对调查离线RL系统的安全威胁的关注更少。本文着重于后门攻击,其中将一些扰动添加到数据(观察结果)中,以便给定正常观察,代理采取了高奖励动作,并且对注入触发因素注入的观察值低回报。在本文中,我们提出了挡板(用于离线增强学习的后门攻击),这种方法可以通过毒化离线RL数据集自动植入后门来植入RL代理,并评估不同的离线RL算法对此攻击的反应。我们对四个任务和四种离线RL算法进行的实验表明了令人不安的事实:现有的离线RL算法都不适用于这种后门攻击。更具体地说,挡板将10 \%的数据集修改为四个任务(3个机器人控制和1个自动驾驶)。在普通设置中,受过毒数据集训练的代理商在中毒数据集中表现良好。但是,当出现触发器时,在四个任务中,代理的性能急剧下降了63.2 \%,53.9 \%,64.7 \%和47.4 \%。在干净的数据集上微调中毒代理后,后门仍然持续。我们进一步表明,插入后门也很难通过流行的防御方法检测到。本文呼吁关注开源离线RL数据集更有效的保护。

Reinforcement learning (RL) makes an agent learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL has become a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle (Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. More specifically, Baffle modifies 10\% of the datasets for four tasks (3 robotic controls and 1 autonomous driving). Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by 63.2\%, 53.9\%, 64.7\%, and 47.4\% in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源