带机器的后门防御

论文标题

带机器的后门防御

Backdoor Defense with Machine Unlearning

论文作者

Liu, Yang, Fan, Mingyuan, Chen, Cen, Liu, Ximeng, Ma, Zhuo, Wang, Li, Ma, Jianfeng

论文摘要

后门注射攻击是对神经网络安全的新威胁，但是，针对攻击的有效防御方法仍然存在有限的威胁。在本文中，我们提出了Baerase，这是一种新颖的方法，可以通过机器擦除在受害者模型中擦除后门。具体而言，Baerase主要以两个关键步骤实施后门防御。首先，进行触发模式恢复以提取受害模型感染的触发模式。在这里，触发模式恢复问题等效于从受害模型中提取未知的噪声分布的问题，该噪声分布可以通过基于熵最大化的生成模型可以轻松解决。随后，Baerase利用这些恢复的触发模式来扭转后门注入程序，并诱导受害者模型通过新设计的基于基于梯度的机器的机器不学习方法来消除污染的记忆。与以前的机器学习解决方案相比，所提出的方法可以摆脱对训练数据的全面访问依据的依赖，并显示出比现有的微调或修剪方法对后门擦除的有效性更高。此外，实验表明，Baerase可以平均降低三种最先进的后门攻击的攻击成功率在四个基准数据集上降低了99 \％。

Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed gradient ascent based machine unlearning method. Compared with the previous machine unlearning solutions, the proposed approach gets rid of the reliance on the full access to training data for retraining and shows higher effectiveness on backdoor erasing than existing fine-tuning or pruning methods. Moreover, experiments show that BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99\% on four benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题