通过特征操纵生成语义对抗示例

论文标题

通过特征操纵生成语义对抗示例

Generating Semantic Adversarial Examples via Feature Manipulation

论文作者

Wang, Shuo, Nepal, Surya, Rudolph, Carsten, Grobler, Marthie, Chen, Shangyu, Chen, Tianle

论文摘要

深处神经网络对对抗性攻击的脆弱性已被广泛证明（例如，对抗性示例攻击）。传统攻击执行非结构化像素的扰动来欺骗分类器。另一种方法是在潜在空间中具有扰动。但是，由于缺乏可解释性和解脱，这种扰动很难控制。在本文中，我们通过设计具有语义含义的结构化扰动来提出更实用的对抗性攻击。我们提出的技术通过分离的潜在代码操纵图像的语义属性。我们技术背后的直觉是，类似域中的图像具有一些共享但与主题无关的语义属性，例如手写数字中的线厚度，可以在双向映射到分离的潜在代码。我们通过操纵这些潜在代码的单个或组合来产生对抗性扰动，并提出两种无监督的语义操纵方法：基于矢量的分离表示和基于特征图的分离表示，就重建图像的潜在代码和平稳性而言。我们对现实世界图像数据进行了广泛的实验评估，以证明我们对黑盒分类器的攻击力量。我们进一步证明了一个通用的图像敏锐语义对手示例的存在。

The vulnerability of deep neural networks to adversarial attacks has been widely demonstrated (e.g., adversarial example attacks). Traditional attacks perform unstructured pixel-wise perturbation to fool the classifier. An alternative approach is to have perturbations in the latent space. However, such perturbations are hard to control due to the lack of interpretability and disentanglement. In this paper, we propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. The intuition behind our technique is that images in similar domains have some commonly shared but theme-independent semantic attributes, e.g. thickness of lines in handwritten digits, that can be bidirectionally mapped to disentangled latent codes. We generate adversarial perturbation by manipulating a single or a combination of these latent codes and propose two unsupervised semantic manipulation approaches: vector-based disentangled representation and feature map-based disentangled representation, in terms of the complexity of the latent codes and smoothness of the reconstructed images. We conduct extensive experimental evaluations on real-world image data to demonstrate the power of our attacks for black-box classifiers. We further demonstrate the existence of a universal, image-agnostic semantic adversarial example.

下载PDF全文

下载文献需遵守相关版权规定

论文标题