不要触发我！对深神经网络的无触发后门攻击

论文标题

不要触发我！对深神经网络的无触发后门攻击

Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

论文作者

Salem, Ahmed, Backes, Michael, Zhang, Yang

论文摘要

由于其严重的安全后果，目前对深层神经网络的后门攻击目前正在深入研究。当前最新的后门攻击需要对手修改输入，通常是通过向其添加触发器，以供目标模型激活后门。这增加了触发，不仅增加了在物理世界中发射后门攻击的困难，而且可以通过多种防御机制轻松检测。在本文中，我们介绍了第一次无触发的后门攻击对深神经网络，在该网络中，对手无需修改输入以触发后门。我们的攻击基于辍学技术。具体而言，我们将一组目标神经元与目标标签相关联。在预测阶段，当目标神经元再次删除时，即启动后门攻击时，模型将输出目标标签。我们攻击的无触发特征使其在物理世界中实用。广泛的实验表明，我们的无触发后门攻击实现了完美的攻击成功率，对模型的效用造成了可观的损害。

Backdoor attack against deep neural networks is currently being profoundly investigated due to its severe security consequences. Current state-of-the-art backdoor attacks require the adversary to modify the input, usually by adding a trigger to it, for the target model to activate the backdoor. This added trigger not only increases the difficulty of launching the backdoor attack in the physical world, but also can be easily detected by multiple defense mechanisms. In this paper, we present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor. Our attack is based on the dropout technique. Concretely, we associate a set of target neurons that are dropped out during model training with the target label. In the prediction phase, the model will output the target label when the target neurons are dropped again, i.e., the backdoor attack is launched. This triggerless feature of our attack makes it practical in the physical world. Extensive experiments show that our triggerless backdoor attack achieves a perfect attack success rate with a negligible damage to the model's utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题