可解释用于Android恶意软件检测的AI：要了解为什么模型表现如此出色？

论文标题

可解释用于Android恶意软件检测的AI：要了解为什么模型表现如此出色？

Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?

论文作者

Liu, Yue, Tantithamthavorn, Chakkrit, Li, Li, Liu, Yepang

论文摘要

基于机器学习（ML）的Android恶意软件检测一直是移动安全社区中最受欢迎的研究主题之一。越来越多的研究表明，机器学习是恶意软件检测的一种有效且有前途的方法，一些作品甚至声称他们所提出的模型可以达到99 \％的检测准确性，几乎没有进一步改进的空间。然而，许多先前的研究表明，不切实际的实验设计带来了实质性偏见，从而导致恶意软件检测过度表现过高。与先前研究了ML分类器的检测性能以定位原因的研究不同，本研究采用可解释的AI（XAI）方法来探索在培训过程中基于ML的模型，检查和解释为什么基于ML的恶意软件分类器在不切实际的实验环境下表现如此良好。我们发现，训练数据集中的时间样本不一致会带来过度优势的分类性能（最高99 \％F1得分和准确性）。重要的是，我们的结果表明，ML模型基于恶意软件和良性之间的时间差异，而不是实际的恶意行为对恶意软件进行分类。我们的评估还证实了一个事实，即不切实际的实验设计不仅会导致不切实际的检测性能，而且会导致可靠性差，从而构成了现实世界应用的重要障碍。这些发现表明，应使用XAI方法来帮助从业人员/研究人员更好地了解AI/ML模型（即恶意软件检测）的工作方式 - 不仅仅是专注于准确的改进。

Machine learning (ML)-based Android malware detection has been one of the most popular research topics in the mobile security community. An increasing number of research studies have demonstrated that machine learning is an effective and promising approach for malware detection, and some works have even claimed that their proposed models could achieve 99\% detection accuracy, leaving little room for further improvement. However, numerous prior studies have suggested that unrealistic experimental designs bring substantial biases, resulting in over-optimistic performance in malware detection. Unlike previous research that examined the detection performance of ML classifiers to locate the causes, this study employs Explainable AI (XAI) approaches to explore what ML-based models learned during the training process, inspecting and interpreting why ML-based malware classifiers perform so well under unrealistic experimental settings. We discover that temporal sample inconsistency in the training dataset brings over-optimistic classification performance (up to 99\% F1 score and accuracy). Importantly, our results indicate that ML models classify malware based on temporal differences between malware and benign, rather than the actual malicious behaviors. Our evaluation also confirms the fact that unrealistic experimental designs lead to not only unrealistic detection performance but also poor reliability, posing a significant obstacle to real-world applications. These findings suggest that XAI approaches should be used to help practitioners/researchers better understand how do AI/ML models (i.e., malware detection) work -- not just focusing on accuracy improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题