论文标题
HTTP网络流量建模时嵌套了多个实例学习
Nested Multiple Instance Learning in Modelling of HTTP network traffic
论文作者
论文摘要
在许多有趣的情况下,机器学习的应用会受到与JSON,XMLS或Protobuffers(诸如JSONS,XMLS或Protibuffers)刺激的复杂结构的数据的阻碍,而该结构化的结构是不实用的,无法转换为向量 /矩阵。此外,由于该结构经常具有语义含义,因此在机器学习模型中反映它应该提高准确性,但更重要的是,它促进了决策和模型的解释。本文证明了从其HTTP流量中识别计算机网络中感染计算机的识别,如何使用多个现实学习中的最新进展来实现这种反思。将所提出的模型与先前艺术的互补方法进行了比较,第一个依赖于人类设计的特征,第二个通过卷积神经网络自动学习的特征。在一个充满挑战的场景中,仅在看不见的域/恶意软件系列上衡量准确性,提出的模型优于先前的艺术,同时为安全研究人员提供了宝贵的反馈。我们认为,拟议的框架将在安全领域以外的其他地方找到应用程序。
In many interesting cases, the application of machine learning is hindered by data having a complicated structure stimulated by a structured file-formats like JSONs, XMLs, or ProtoBuffers, which is non-trivial to convert to a vector / matrix. Moreover, since the structure frequently carries a semantic meaning, reflecting it in the machine learning model should improve the accuracy but more importantly it facilitates the explanation of decisions and the model. This paper demonstrates on the identification of infected computers in the computer network from their HTTP traffic, how to achieve this reflection using recent progress in multiple-instance learning. The proposed model is compared to complementary approaches from the prior art, the first relying on human-designed features and the second on automatically learned features through convolution neural networks. In a challenging scenario measuring accuracy only on unseen domains/malware families, the proposed model is superior to the prior art while providing a valuable feedback to the security researchers. We believe that the proposed framework will found applications elsewhere even beyond the field of security.