论文标题
带有样本特异性触发器的无形后门攻击
Invisible Backdoor Attack with Sample-Specific Triggers
论文作者
论文摘要
最近,后门攻击对深神经网络(DNNS)的培训过程构成了新的安全威胁。攻击者打算将隐藏的后门注入DNN中,以使攻击模型在良性样本上的性能良好,而如果攻击者定义的触发器激活隐藏的后门,则其预测将发生恶意更改。现有的后门攻击通常采用触发器是样本不合时式的设置,即$。$不同的中毒样品包含相同的触发因素,从而使攻击很容易被当前的后门防御措施减轻。在这项工作中,我们探索了一种新颖的攻击范式,后门触发器特定于样本。在我们的攻击中,我们只需要在许多现有攻击中需要操纵其他培训组件(例如$,$,培训损失和模型结构),而不需要操纵其他培训组件(例如$,$ $,$ $)。具体而言,受到最近基于DNN的图像隐志的最新进展的启发,我们通过将攻击者指定的字符串编码在Encoder-decoder网络中,将示例特异性的不可见添加剂噪声作为后门触发器。当在中毒数据集上训练DNN时,将生成从字符串到目标标签的映射。基准数据集上的大量实验验证了我们方法在具有防御措施的攻击模型中的有效性。
Recently, backdoor attacks pose a new security threat to the training process of deep neural networks (DNNs). Attackers intend to inject hidden backdoors into DNNs, such that the attacked model performs well on benign samples, whereas its prediction will be maliciously changed if hidden backdoors are activated by the attacker-defined trigger. Existing backdoor attacks usually adopt the setting that triggers are sample-agnostic, $i.e.,$ different poisoned samples contain the same trigger, resulting in that the attacks could be easily mitigated by current backdoor defenses. In this work, we explore a novel attack paradigm, where backdoor triggers are sample-specific. In our attack, we only need to modify certain training samples with invisible perturbation, while not need to manipulate other training components ($e.g.$, training loss, and model structure) as required in many existing attacks. Specifically, inspired by the recent advance in DNN-based image steganography, we generate sample-specific invisible additive noises as backdoor triggers by encoding an attacker-specified string into benign images through an encoder-decoder network. The mapping from the string to the target label will be generated when DNNs are trained on the poisoned dataset. Extensive experiments on benchmark datasets verify the effectiveness of our method in attacking models with or without defenses.