当对手训练遇到视觉变形金刚：从培训到建筑的食谱

论文标题

当对手训练遇到视觉变形金刚：从培训到建筑的食谱

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

论文作者

Mo, Yichuan, Wu, Dongxian, Wang, Yifei, Guo, Yiwen, Wang, Yisen

论文摘要

视觉变压器（VIT）最近在广泛的视力任务中取得了竞争性能。不幸的是，在流行的威胁模型上，自然训练的VIT被证明与卷积神经网络（CNN）相比，没有更多的对抗性鲁棒性。 VIT仍然需要对抗训练，以防止这种对抗性攻击。在本文中，我们通过对基准数据集的各种培训技术进行广泛评估，提供有关VIT的对抗培训配方的首次全面研究。我们发现，训练前和SGD优化器对于VIT的对抗训练是必需的。进一步将VIT视为一种新型的模型体系结构，我们从独特的建筑组件的角度研究了其对抗性鲁棒性。我们发现，当从某些注意力块中随机掩盖梯度或在对抗训练期间对某些斑块上的某些斑块进行扰动时，可以显着改善VIT的对抗性鲁棒性，这可能会开辟一系列工作，以探索诸如VIT等新设计的模型中的建筑信息。我们的代码可从https://github.com/mo666666/when-versarial-training-meets-vision-vision-transformers获得。

Vision Transformers (ViTs) have recently achieved competitive performance in broad vision tasks. Unfortunately, on popular threat models, naturally trained ViTs are shown to provide no more adversarial robustness than convolutional neural networks (CNNs). Adversarial training is still required for ViTs to defend against such adversarial attacks. In this paper, we provide the first and comprehensive study on the adversarial training recipe of ViTs via extensive evaluation of various training techniques across benchmark datasets. We find that pre-training and SGD optimizer are necessary for ViTs' adversarial training. Further considering ViT as a new type of model architecture, we investigate its adversarial robustness from the perspective of its unique architectural components. We find, when randomly masking gradients from some attention blocks or masking perturbations on some patches during adversarial training, the adversarial robustness of ViTs can be remarkably improved, which may potentially open up a line of work to explore the architectural information inside the newly designed models like ViTs. Our code is available at https://github.com/mo666666/When-Adversarial-Training-Meets-Vision-Transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题