通过受控排毒对神经网络攻击的深太空trojan攻击

论文标题

通过受控排毒对神经网络攻击的深太空trojan攻击

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

论文作者

Cheng, Siyuan, Liu, Yingqi, Ma, Shiqing, Zhang, Xiangyu

论文摘要

特洛伊木马（后门）攻击是对深度神经网络的对抗性攻击的一种形式，攻击者为受害者提供了对恶意数据的模型训练/再训练的形式。当正常输入用某种称为触发器的图案盖章时，可以激活后门，从而导致错误分类。许多现有的特洛伊木马攻击使他们的触发器是输入空间补丁/对象（例如，具有纯色的多边形）或简单的输入转换（例如Instagram滤波器）。这些简单的触发因素容易受到最近的后门检测算法的影响。我们提出了一种具有五个特征的新型深空特洛伊木马攻击：有效性，隐身性，可控性，可靠性和对深度特征的依赖。我们对包括ImageNet在内的各种数据集的9个图像分类器进行了广泛的实验，以证明这些属性，并表明我们的攻击可以逃避最新的防御。

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.

下载PDF全文

下载文献需遵守相关版权规定

论文标题