使用变压器检测恶意源代码

论文标题

使用变压器检测恶意源代码

Malicious Source Code Detection Using Transformer

论文作者

Tsfaty, Chen, Fire, Michael

论文摘要

开源代码被认为是现代软件开发中的常见实践。但是，重复使用其他代码允许坏演员访问广泛的开发人员社区，因此依靠它的产品。这些攻击被归类为供应链攻击。近年来，越来越多的供应链攻击在软件开发过程中利用开源的供应，转移下载和安装程序，无论是自动还是手动。多年来，已经发明了许多用于检测脆弱包装的方法。但是，在软件包中检测恶意代码并不常见。这些检测方法可以广泛地分为使用（动态）并且不使用（静态）代码执行的分析。在这里，我们使用变压器（MSDT）算法介绍了恶意源代码检测。 MSDT是一种基于深度学习方法的新型静态分析，该方法可检测现实世界中的代码注入案例到源代码软件包。在这项研究中，我们使用了具有超过600,000个不同功能的MSDT和一个数据集来嵌入各种功能，并将群集算法应用于所得的向量，从而通过检测异常值来检测恶意功能。我们通过进行广泛的实验来评估MSDT的性能，并证明我们的算法能够检测到具有precision@k值最高为0.909的恶意代码的功能。

Open source code is considered a common practice in modern software development. However, reusing other code allows bad actors to access a wide developers' community, hence the products that rely on it. Those attacks are categorized as supply chain attacks. Recent years saw a growing number of supply chain attacks that leverage open source during software development, relaying the download and installation procedures, whether automatic or manual. Over the years, many approaches have been invented for detecting vulnerable packages. However, it is uncommon to detect malicious code within packages. Those detection approaches can be broadly categorized as analyzes that use (dynamic) and do not use (static) code execution. Here, we introduce Malicious Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel static analysis based on a deep learning method that detects real-world code injection cases to source code packages. In this study, we used MSDT and a dataset with over 600,000 different functions to embed various functions and applied a clustering algorithm to the resulting vectors, detecting the malicious functions by detecting the outliers. We evaluated MSDT's performance by conducting extensive experiments and demonstrated that our algorithm is capable of detecting functions that were injected with malicious code with precision@k values of up to 0.909.

下载PDF全文

下载文献需遵守相关版权规定

论文标题