论文标题
通过受控排毒对神经网络攻击的深太空trojan攻击
Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification
论文作者
论文摘要
特洛伊木马(后门)攻击是对深度神经网络的对抗性攻击的一种形式,攻击者为受害者提供了对恶意数据的模型训练/再训练的形式。当正常输入用某种称为触发器的图案盖章时,可以激活后门,从而导致错误分类。许多现有的特洛伊木马攻击使他们的触发器是输入空间补丁/对象(例如,具有纯色的多边形)或简单的输入转换(例如Instagram滤波器)。这些简单的触发因素容易受到最近的后门检测算法的影响。我们提出了一种具有五个特征的新型深空特洛伊木马攻击:有效性,隐身性,可控性,可靠性和对深度特征的依赖。我们对包括ImageNet在内的各种数据集的9个图像分类器进行了广泛的实验,以证明这些属性,并表明我们的攻击可以逃避最新的防御。
Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.