预训练的对抗扰动

论文标题

预训练的对抗扰动

Pre-trained Adversarial Perturbations

论文作者

Ban, Yuanhao, Dong, Yinpeng

论文摘要

近年来，由于其在微调后在许多下游任务上的表现出色，因此近年来，自我监督的预训练引起了人们的关注。但是，众所周知，深度学习模型缺乏对对抗性示例的鲁棒性，尽管探索较少，但这也可以引起预培训模型的安全问题。在本文中，我们通过引入预先训练的对抗扰动（PAP）来深入研究预训练模型的鲁棒性，这些扰动是为预训练的模型制定的普遍扰动，以在攻击微调的攻击微调的情况下，不知道下游任务，以保持有效性。为此，我们提出了一种低级层提升攻击（L4A）方法，以通过提升预训练模型的低水平层的神经元激活来生成有效的PAP。 L4A配备了增强的噪音增强策略，可有效地针对微型模型生成更可转移的PAP。与最新方法相比，对典型的预训练视力模型和十个下游任务进行了广泛的实验表明，我们的方法提高了攻击成功率。

Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题