班级感知的领域适应以改善对抗性鲁棒性

论文标题

班级感知的领域适应以改善对抗性鲁棒性

Class-Aware Domain Adaptation for Improving Adversarial Robustness

论文作者

Hou, Xianxu, Liu, Jingxin, Xu, Bolei, Wang, Xiaolong, Liu, Bozhi, Qiu, Guoping

论文摘要

最近的作品表明，卷积神经网络容易受到对抗性示例的影响，即攻击者故意旨在导致模型犯错的机器学习模型的输入。为了改善神经网络的对抗性鲁棒性，已经提出了对抗性训练，以通过将对抗性示例注入训练数据中来训练网络。但是，对抗性训练可能会过分地适合特定类型的对抗性攻击，并导致清洁图像的标准精度下降。为此，我们提出了一种新颖的阶级感知领域适应性（CADA）方法，用于对抗防御，而无需直接应用对抗训练。具体而言，我们建议通过域歧视者学习以示例示例和清洁图像的域不变特征。此外，我们将一个阶级感知组件引入歧视器中，以增加网络的歧视能力作为对抗性示例。我们使用多个基准数据集评估了新提出的方法。结果表明，我们的方法可以显着改善各种攻击的对抗鲁棒性的最新作用，并在干净的图像上保持高性能。

Recent works have demonstrated convolutional neural networks are vulnerable to adversarial examples, i.e., inputs to machine learning models that an attacker has intentionally designed to cause the models to make a mistake. To improve the adversarial robustness of neural networks, adversarial training has been proposed to train networks by injecting adversarial examples into the training data. However, adversarial training could overfit to a specific type of adversarial attack and also lead to standard accuracy drop on clean images. To this end, we propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training. Specifically, we propose to learn domain-invariant features for adversarial examples and clean images via a domain discriminator. Furthermore, we introduce a class-aware component into the discriminator to increase the discriminative power of the network for adversarial examples. We evaluate our newly proposed approach using multiple benchmark datasets. The results demonstrate that our method can significantly improve the state-of-the-art of adversarial robustness for various attacks and maintain high performances on clean images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题