论文标题
普遍对抗扰动中的主要类别的分析
Analysis of Dominant Classes in Universal Adversarial Perturbations
论文作者
论文摘要
深层神经网络容易被对抗性例子愚弄的原因仍然是一个公开的讨论。确实,可以采用许多不同的策略来有效地产生对抗性攻击,其中一些依赖于不同的理论理由。在这些策略中,普遍的(输入 - 无形)扰动特别令人感兴趣,因为它们有能力独立于应用扰动的输入而愚弄网络。在这项工作中,我们调查了普遍扰动的有趣现象,该现象先前在文献中进行了报道,但没有证实的理由:普遍的扰动将大多数输入的预测类别变成一个特定(主要的)类别,即使在企业创建过程中未指定这种行为。为了证明这种现象的原因,我们提出了许多假设,并使用音频域中的语音命令分类问题进行实验测试。我们的分析揭示了通用扰动的有趣属性,提出了新的方法来产生此类攻击,并在几何和数据功能的角度下都提供了主导类别的解释。
The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many different strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (input-agnostic) perturbations are of particular interest, due to their capability to fool a network independently of the input in which the perturbation is applied. In this work, we investigate an intriguing phenomenon of universal perturbations, which has been reported previously in the literature, yet without a proven justification: universal perturbations change the predicted classes for most inputs into one particular (dominant) class, even if this behavior is not specified during the creation of the perturbation. In order to justify the cause of this phenomenon, we propose a number of hypotheses and experimentally test them using a speech command classification problem in the audio domain as a testbed. Our analyses reveal interesting properties of universal perturbations, suggest new methods to generate such attacks and provide an explanation of dominant classes, under both a geometric and a data-feature perspective.