为对抗性鲁棒性评分黑盒模型

论文标题

为对抗性鲁棒性评分黑盒模型

Scoring Black-Box Models for Adversarial Robustness

论文作者

Vora, Jian, Samala, Pranay Reddy

论文摘要

深层神经网络容易受到对抗的影响，并提出了各种方法来捍卫这些模型免受不同扰动模型下的对抗攻击。通过首先构造模型的对抗输入，然后在构造的对抗性输入上测试模型性能，已经分析了模型对对抗攻击的鲁棒性。这些攻击中的大多数都要求模型是白框，需要访问数据标签，并且寻找对抗性输入在计算上可能很昂贵。我们为黑盒模型提出了一种简单的评分方法，该方法表明它们对对抗性输入的鲁棒性。我们表明，对手更健壮的模型具有较小的$ L_1 $ norm lime lige strige和更清晰的解释。

Deep neural networks are susceptible to adversarial inputs and various methods have been proposed to defend these models against adversarial attacks under different perturbation models. The robustness of models to adversarial attacks has been analyzed by first constructing adversarial inputs for the model, and then testing the model performance on the constructed adversarial inputs. Most of these attacks require the model to be white-box, need access to data labels, and finding adversarial inputs can be computationally expensive. We propose a simple scoring method for black-box models which indicates their robustness to adversarial input. We show that adversarially more robust models have a smaller $l_1$-norm of LIME weights and sharper explanations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题