论文标题
OOG- OPTUNA优化的GAN采样技术,用于表格不平衡的恶意软件数据
OOG- Optuna Optimized GAN Sampling Technique for Tabular Imbalanced Malware Data
论文作者
论文摘要
网络空间在现代技术时代占据了人们的很大一部分生活,尽管有些人永远利用它,但也有些人没有。恶意软件是一个应用程序,其构造不受良性目标的激励,它可能会损害,窃取甚至更改个人信息以及安全的应用程序和软件。因此,有许多避免恶意软件的技术,其中之一是开发恶意软件的样本,以便可以随着越来越多的Malwares进行更新,从而可以识别出Malwares何时尝试进入。本研究中已使用生成对抗网络(GAN)采样技术来生成新的恶意软件样本。 gan具有多种变体,为了确定哪种变体对于给定的数据集样本最佳,必须修改其参数。这项研究采用Optuna(一种自主的超参数调谐算法)来确定所考虑的数据集的最佳设置。在这项研究中,分别显示了Optuna优化GAN(OOG)方法的结构,分别分别为98.06%,99.00%,97.23%和98.04%的体系结构,分别为准确性,精度,召回和F1得分。在调整了五种监督算法,XGBOOST,LIGHTGBM,CATBOOST,额外的树木分类器和梯度增强分类器的超级参数之后,本文的方法还采用了加权乐团技术来获得此结果。除了比较该领域的现有工作外,该研究还证明了GAN与Smote等其他抽样技术相比有希望。
Cyberspace occupies a large portion of people's life in the age of modern technology, and while there are those who utilize it for good, there are also those who do not. Malware is an application whose construction was not motivated by a benign goal and it can harm, steal, or even alter personal information and secure applications and software. Thus, there are numerous techniques to avoid malware, one of which is to develop samples of malware so that the system can be updated with the growing number of malwares, allowing it to recognize when malwares attempt to enter. The Generative Adversarial Network (GAN) sampling technique has been used in this study to generate new malware samples. GANs have multiple variants, and in order to determine which variant is optimal for a given dataset sample, their parameters must be modified. This study employs Optuna, an autonomous hyperparameter tuning algorithm, to determine the optimal settings for the dataset under consideration. In this study, the architecture of the Optuna Optimized GAN (OOG) method is shown, along with scores of 98.06%, 99.00%, 97.23%, and 98.04% for accuracy, precision, recall and f1 score respectively. After tweaking the hyperparameters of five supervised boosting algorithms, XGBoost, LightGBM, CatBoost, Extra Trees Classifier, and Gradient Boosting Classifier, the methodology of this paper additionally employs the weighted ensemble technique to acquire this result. In addition to comparing existing efforts in this domain, the study demonstrates how promising GAN is in comparison to other sampling techniques such as SMOTE.