论文标题
预训练的对抗扰动
Pre-trained Adversarial Perturbations
论文作者
论文摘要
近年来,由于其在微调后在许多下游任务上的表现出色,因此近年来,自我监督的预训练引起了人们的关注。但是,众所周知,深度学习模型缺乏对对抗性示例的鲁棒性,尽管探索较少,但这也可以引起预培训模型的安全问题。在本文中,我们通过引入预先训练的对抗扰动(PAP)来深入研究预训练模型的鲁棒性,这些扰动是为预训练的模型制定的普遍扰动,以在攻击微调的攻击微调的情况下,不知道下游任务,以保持有效性。为此,我们提出了一种低级层提升攻击(L4A)方法,以通过提升预训练模型的低水平层的神经元激活来生成有效的PAP。 L4A配备了增强的噪音增强策略,可有效地针对微型模型生成更可转移的PAP。与最新方法相比,对典型的预训练视力模型和十个下游任务进行了广泛的实验表明,我们的方法提高了攻击成功率。
Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.