检查对抗性示例与深网中类歧管的近距离

论文标题

检查对抗性示例与深网中类歧管的近距离

Examining the Proximity of Adversarial Examples to Class Manifolds in Deep Networks

论文作者

Pócoš, Štefan, Bečková, Iveta, Farkaš, Igor

论文摘要

深度神经网络在多个领域取得了显着的性能。但是，经过适当的训练，他们遭受了针对对抗性例子（AES）的固有脆弱性。在这项工作中，我们通过分析其在隐藏层上的激活来阐明AE的内部表示。我们测试了各种AES，每个AE都使用特定的规范约束制作，这会影响其视觉外观，并最终在训练有素的网络中行为。我们在图像分类任务（MNIST和CIFAR-10）方面的结果揭示了AE的各个类型之间的质量差异，当它们比较其与内部表示上的类别的歧管相比。我们提出了两种方法，可用于将距离与特定于类的流形进行比较，而不论整个网络中的尺寸变化如何。使用这些方法，我们始终如一地确认，某些对抗性不一定会留下正确类的歧管的距离，甚至在神经网络的最后一个隐藏层中。接下来，使用UMAP可视化技术，我们将类激活投影到2D空间。结果表明，单个AE的激活与测试集的激活纠缠在一起。但是，这不适合一组称为“垃圾级”的精心制作的投入。我们还使用柔软的最近邻居损失来确认对抗性与测试集的纠缠。

Deep neural networks achieve remarkable performance in multiple fields. However, after proper training they suffer from an inherent vulnerability against adversarial examples (AEs). In this work we shed light on inner representations of the AEs by analysing their activations on the hidden layers. We test various types of AEs, each crafted using a specific norm constraint, which affects their visual appearance and eventually their behavior in the trained networks. Our results in image classification tasks (MNIST and CIFAR-10) reveal qualitative differences between the individual types of AEs, when comparing their proximity to the class-specific manifolds on the inner representations. We propose two methods that can be used to compare the distances to class-specific manifolds, regardless of the changing dimensions throughout the network. Using these methods, we consistently confirm that some of the adversarials do not necessarily leave the proximity of the manifold of the correct class, not even in the last hidden layer of the neural network. Next, using UMAP visualisation technique, we project the class activations to 2D space. The results indicate that the activations of the individual AEs are entangled with the activations of the test set. This, however, does not hold for a group of crafted inputs called the rubbish class. We also confirm the entanglement of adversarials with the test set numerically using the soft nearest neighbour loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题