论文标题
用功能呼叫的表示错误的表示恶意软件开放设置识别
Representation learning with function call graph transformations for malware open set recognition
论文作者
论文摘要
开放式识别(OSR)问题在许多机器学习(ML)应用程序(例如安全性)中一直是一个挑战。由于新的/未知的恶意软件系列经常发生,因此很难耗尽涵盖ML系统培训过程的所有课程的样品。高级恶意软件分类系统应在对未知类敏感的同时正确对已知类别进行分类。在本文中,我们为恶意软件分类中的OSR问题引入了一种自我监督的预训练方法。我们建议基于功能呼叫图(FCG)的恶意软件表示形式进行两种转换,以促进借口任务。另外,我们提出了一种统计阈值方法,以找到未知类别的最佳阈值。此外,实验结果表明,我们提出的训练过程可以改善OSR问题的不同下游损失函数的不同性能。
Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.