对攻击者的对抗性攻击：减轻基于黑框的查询攻击的后处理

论文标题

对攻击者的对抗性攻击：减轻基于黑框的查询攻击的后处理

Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

论文作者

Chen, Sizhe, Huang, Zhehao, Tao, Qinghua, Wu, Yingwen, Xie, Cihang, Huang, Xiaolin

论文摘要

基于分数的查询攻击（SQAS）仅使用模型的输出分数来制定数十个查询的对抗扰动，对深层神经网络构成了实际威胁。尽管如此，我们注意到，如果输出的损失趋势略有干扰，SQA可能很容易被误导，从而变得效率要降低了。在这个想法之后，我们提出了一种新颖的防御，即对攻击者（AAA）的对抗性攻击，以通过稍微修改输出逻辑来将SQA朝向不正确的攻击方向混淆。通过这种方式，（1）SQA被阻止，无论该模型的最差稳定性如何；（2）原始模型预测几乎不会改变，即，清洁准确性没有降解；（3）可以同时提高置信分数的校准。提供了广泛的实验来验证上述优势。例如，通过在CIFAR-10上设置$ \ ell_ \ infty = 8/255 $，我们提出的AAA可帮助WideSnet-28在方形攻击（2500查询）下安全80.59％的精度，而最好的先前辩护（即，对抗性培训）仅达到67.44％。由于AAA攻击SQA的一般贪婪策略，因此可以在6个SQA下的8个CIFAR-10/Imagenet模型上始终观察到AAA超过8个防御的优势，使用不同的攻击目标，界限，规范，规范，损失和策略。此外，AAA可以更好地校准，而不会损害准确性。我们的代码可在https://github.com/sizhe-chen/aaa上找到。

The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores. Nonetheless, we note that if the loss trend of the outputs is slightly perturbed, SQAs could be easily misled and thereby become much less effective. Following this idea, we propose a novel defense, namely Adversarial Attack on Attackers (AAA), to confound SQAs towards incorrect attack directions by slightly modifying the output logits. In this way, (1) SQAs are prevented regardless of the model's worst-case robustness; (2) the original model predictions are hardly changed, i.e., no degradation on clean accuracy; (3) the calibration of confidence scores can be improved simultaneously. Extensive experiments are provided to verify the above advantages. For example, by setting $\ell_\infty=8/255$ on CIFAR-10, our proposed AAA helps WideResNet-28 secure 80.59% accuracy under Square attack (2500 queries), while the best prior defense (i.e., adversarial training) only attains 67.44%. Since AAA attacks SQA's general greedy strategy, such advantages of AAA over 8 defenses can be consistently observed on 8 CIFAR-10/ImageNet models under 6 SQAs, using different attack targets, bounds, norms, losses, and strategies. Moreover, AAA calibrates better without hurting the accuracy. Our code is available at https://github.com/Sizhe-Chen/AAA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题