贡献者了解对抗后门攻击的防御

论文标题

贡献者了解对抗后门攻击的防御

Contributor-Aware Defenses Against Adversarial Backdoor Attacks

论文作者

Dawson, Glenn, Umer, Muhammad, Polikar, Robi

论文摘要

用于图像分类的深度神经网络众所周知，很容易受到对抗性攻击的影响。对抗性后门攻击是一种吸引了最近关注的攻击，该攻击证明了对特定示例进行有针对性错误分类的能力。特别是，后门攻击试图迫使模型学习后门触发模式和错误标签之间的虚假关系。为了应对这种威胁，已经提出了许多防御措施。但是，防御后门攻击的防御措施集中在后门模式检测上，这对于新颖或意外的后门模式设计类型可能不可靠。我们介绍了对对抗性环境的新型重新定义，其中对手的存在隐含地接受了多个数据库贡献者的存在。然后，在贡献者意识的温和假设下，有可能利用这一知识来防御后门攻击，通过破坏虚假的标签协会。我们提出了一个贡献者意识到的通用防御框架，用于在存在多种潜在的对抗性数据源的情况下学习，该数据源利用半监督的合奏并从人群中学习来过滤对对抗性触发器产生的错误标签。重要的是，这种防御策略对后门模式设计不可知，因为它在训练或推理过程中无需（甚至尝试）执行对手识别或后门模式检测。我们的实证研究表明，提议的框架对来自多个同时对手的对抗后门攻击的鲁棒性。

Deep neural networks for image classification are well-known to be vulnerable to adversarial attacks. One such attack that has garnered recent attention is the adversarial backdoor attack, which has demonstrated the capability to perform targeted misclassification of specific examples. In particular, backdoor attacks attempt to force a model to learn spurious relations between backdoor trigger patterns and false labels. In response to this threat, numerous defensive measures have been proposed; however, defenses against backdoor attacks focus on backdoor pattern detection, which may be unreliable against novel or unexpected types of backdoor pattern designs. We introduce a novel re-contextualization of the adversarial setting, where the presence of an adversary implicitly admits the existence of multiple database contributors. Then, under the mild assumption of contributor awareness, it becomes possible to exploit this knowledge to defend against backdoor attacks by destroying the false label associations. We propose a contributor-aware universal defensive framework for learning in the presence of multiple, potentially adversarial data sources that utilizes semi-supervised ensembles and learning from crowds to filter the false labels produced by adversarial triggers. Importantly, this defensive strategy is agnostic to backdoor pattern design, as it functions without needing -- or even attempting -- to perform either adversary identification or backdoor pattern detection during either training or inference. Our empirical studies demonstrate the robustness of the proposed framework against adversarial backdoor attacks from multiple simultaneous adversaries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题