基于梯度的解释是否可以说明对Android恶意软件的对抗性鲁棒性？

论文标题

基于梯度的解释是否可以说明对Android恶意软件的对抗性鲁棒性？

Do Gradient-based Explanations Tell Anything About Adversarial Robustness to Android Malware?

论文作者

Melis, Marco, Scalas, Michele, Demontis, Ambra, Maiorca, Davide, Biggio, Battista, Giacinto, Giorgio, Roli, Fabio

论文摘要

尽管机器学习算法表现出很强的检测Android恶意软件的能力，但可以通过注射一小部分伪造组件（例如，权限和系统调用）而在不损害侵入性功能的情况下，可以通过稀疏的逃避攻击来逃避它们。先前的工作表明，为了提高针对此类攻击的鲁棒性，学习算法应避免过度强调少数判别特征，而是提供依赖大量组件的决策。在这项工作中，我们研究了通过识别最相关的功能来解释分类器的决策的基于梯度的归因方法，可用于帮助识别和选择更多健壮的算法。为此，我们建议利用两个代表解释均匀性的不同指标，以及一种新的紧凑型安全措施，称为对抗性鲁棒性指标。我们的实验在两个不同的数据集和五个针对Android恶意软件检测的分类算法上进行了，表明解释的均匀性与对抗性鲁棒性之间存在牢固的联系。特别是，我们发现，当应用于线性和非线性探测器时，流行的技术（例如梯度*输入和积分梯度）与安全性密切相关，而更基本的解释技术（例如简单梯度）并未提供有关此类分类器的鲁棒性的可靠信息。

While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizing few discriminant features, providing instead decisions that rely upon a large subset of components. In this work, we investigate whether gradient-based attribution methods, used to explain classifiers' decisions by identifying the most relevant features, can be used to help identify and select more robust algorithms. To this end, we propose to exploit two different metrics that represent the evenness of explanations, and a new compact security measure called Adversarial Robustness Metric. Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial robustness. In particular, we found that popular techniques like Gradient*Input and Integrated Gradients are strongly correlated to security when applied to both linear and nonlinear detectors, while more elementary explanation techniques like the simple Gradient do not provide reliable information about the robustness of such classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题