对压缩模型的通用攻击的鲁棒性和可传递性

论文标题

对压缩模型的通用攻击的鲁棒性和可传递性

Robustness and Transferability of Universal Attacks on Compressed Models

论文作者

Matachana, Alberto G., Co, Kenneth T., Muñoz-González, Luis, Martinez, David, Lupu, Emil C.

论文摘要

修剪和量化等神经网络压缩方法在有效地在边缘设备上部署深神网络（DNN）非常有效。但是，DNN仍然容易受到专门设计用于欺骗这些模型的对抗性示例的攻击。特别是，通用的对抗扰动（UAPS）是一类强大的对抗性攻击类，它们会产生对抗性扰动，可以在大量输入中推广。在这项工作中，我们分析了各种压缩技术对UAP攻击的影响，包括不同形式的修剪和量化。我们测试了压缩模型的鲁棒性，并将其转移攻击与CIFAR-10和SVHN数据集上的未压缩配音进行比较。我们的评估揭示了修剪方法之间的明显差异，包括软过滤器和训练后修剪。我们观察到，修剪和完整模型之间的UAP转移攻击是有限的，这表明这些模型的系统漏洞是不同的。这一发现具有实际的含义，因为使用不同的压缩技术可以钝化黑盒转移攻击的有效性。我们表明，在某些情况下，量化可以产生梯度掩盖，从而给予错误的安全感。最后，我们的结果表明，关于压缩模型对UAP攻击的鲁棒性的结论取决于应用程序，观察我们实验中使用的两个数据集中的不同现象。

Neural network compression methods like pruning and quantization are very effective at efficiently deploying Deep Neural Networks (DNNs) on edge devices. However, DNNs remain vulnerable to adversarial examples-inconspicuous inputs that are specifically designed to fool these models. In particular, Universal Adversarial Perturbations (UAPs), are a powerful class of adversarial attacks which create adversarial perturbations that can generalize across a large set of inputs. In this work, we analyze the effect of various compression techniques to UAP attacks, including different forms of pruning and quantization. We test the robustness of compressed models to white-box and transfer attacks, comparing them with their uncompressed counterparts on CIFAR-10 and SVHN datasets. Our evaluations reveal clear differences between pruning methods, including Soft Filter and Post-training Pruning. We observe that UAP transfer attacks between pruned and full models are limited, suggesting that the systemic vulnerabilities across these models are different. This finding has practical implications as using different compression techniques can blunt the effectiveness of black-box transfer attacks. We show that, in some scenarios, quantization can produce gradient-masking, giving a false sense of security. Finally, our results suggest that conclusions about the robustness of compressed models to UAP attacks is application dependent, observing different phenomena in the two datasets used in our experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题