用于检测数据渗透的机器学习：评论

论文标题

用于检测数据渗透的机器学习：评论

Machine Learning for Detecting Data Exfiltration: A Review

论文作者

Sabir, Bushra, Ullah, Faheem, Babar, M. Ali, Gaire, Raj

论文摘要

背景：网络安全，机器学习（ML）和软件工程（SE）的交集的研究最近在提出检测复杂数据剥落攻击的对策方面采取了重要步骤。重要的是要系统地审查和综合基于ML的数据渗透对策，以建立有关此重要主题的知识。目的：本文旨在系统地审查基于ML的数据渗透对策，以识别和分类ML方法，功能工程技术，评估数据集以及用于这些对策的性能指标。这篇综述还旨在确定基于ML基于ML的数据渗透对策的差距。方法：我们使用系统文献综述（SLR）方法来选择和评论{92}论文。结果：审查使我们能够（a）将对策中使用的ML方法分类为数据驱动和行为驱动的方法，（b）将特征分类为六种类型：行为，基于内容，统计，统计，句法，空间和时间范围，（c）通过将评估数据集分类为模拟，衡量的量表和实际数据，并将其分类为11个范围，并确定11个范围和（datthessected andthessect和实际数据）和（D）。结论：我们得出的结论是：（i）应探讨数据驱动和行为驱动的方法的整合；（ii）需要开发高质量和大型评估数据集；（iii）应将增量ML模型培训纳入对策；（iv）应在对对抗性学习的过程中考虑和探索对抗性学习的韧性，以避免中毒攻击；（v）应鼓励使用自动化功能工程来有效检测数据渗透攻击。

Context: Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (SE) has recently taken significant steps in proposing countermeasures for detecting sophisticated data exfiltration attacks. It is important to systematically review and synthesize the ML-based data exfiltration countermeasures for building a body of knowledge on this important topic. Objective: This paper aims at systematically reviewing ML-based data exfiltration countermeasures to identify and classify ML approaches, feature engineering techniques, evaluation datasets, and performance metrics used for these countermeasures. This review also aims at identifying gaps in research on ML-based data exfiltration countermeasures. Method: We used a Systematic Literature Review (SLR) method to select and review {92} papers. Results: The review has enabled us to (a) classify the ML approaches used in the countermeasures into data-driven, and behaviour-driven approaches, (b) categorize features into six types: behavioural, content-based, statistical, syntactical, spatial and temporal, (c) classify the evaluation datasets into simulated, synthesized, and real datasets and (d) identify 11 performance measures used by these studies. Conclusion: We conclude that: (i) the integration of data-driven and behaviour-driven approaches should be explored; (ii) There is a need of developing high quality and large size evaluation datasets; (iii) Incremental ML model training should be incorporated in countermeasures; (iv) resilience to adversarial learning should be considered and explored during the development of countermeasures to avoid poisoning attacks; and (v) the use of automated feature engineering should be encouraged for efficiently detecting data exfiltration attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题