有程式化的对抗防御

论文标题

有程式化的对抗防御

Stylized Adversarial Defense

论文作者

Naseer, Muzammal, Khan, Salman, Hayat, Munawar, Khan, Fahad Shahbaz, Porikli, Fatih

论文摘要

深度卷积神经网络（CNN）很容易被输入图像的细微，不可察觉的变化所欺骗。为了解决此漏洞，对抗训练会创建扰动模式，并将其包括在培训设置中以鲁棒性化模型。与仅使用阶级有限信息（例如，使用跨透明损失）的现有对抗训练方法相反，我们建议从特征空间中利用其他信息来依靠工艺更强大的对手，而这些对手又用于学习强大的模型。具体来说，我们将使用另一个类的目标样本的样式和内容信息以及其类似边界的信息来创建对抗性扰动。我们以深入监督的方式应用了我们提出的多任务目标，从而提取了多尺度的特征知识，以创建最大程度地分开对手。随后，我们提出了一种最大边缘对抗训练方法，该方法最大程度地减少了源图像与其对手之间的距离，并最大程度地提高了对手和目标图像之间的距离。与最先进的防御能力相比，我们的对抗训练方法表明了强大的鲁棒性，可以很好地推广到自然发生的损坏和数据分配变化，并保留了在干净示例中的模型精度。

Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, imperceptible changes to the input images. To address this vulnerability, adversarial training creates perturbation patterns and includes them in the training set to robustify the model. In contrast to existing adversarial training methods that only use class-boundary information (e.g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model. Specifically, we use the style and content information of the target sample from another class, alongside its class-boundary information to create adversarial perturbations. We apply our proposed multi-task objective in a deeply supervised manner, extracting multi-scale feature knowledge to create maximally separating adversaries. Subsequently, we propose a max-margin adversarial training approach that minimizes the distance between source image and its adversary and maximizes the distance between the adversary and the target image. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses, generalizes well to naturally occurring corruptions and data distributional shifts, and retains the model accuracy on clean examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题