变压器模型在stackoverflow讨论中的物联网安全检测中的有效性

论文标题

变压器模型在stackoverflow讨论中的物联网安全检测中的有效性

Effectiveness of Transformer Models on IoT Security Detection in StackOverflow Discussions

论文作者

Mandal, Nibir Chandra, Shahariar, G. M., Shawon, Md. Tanvir Rouf

论文摘要

物联网（IoT）是一个新兴的概念，它直接链接到连接到Internet的数十亿个物理项目或“事物”，并且都在收集和在设备和系统之间收集和交换信息。但是，IoT设备并未考虑到安全性，这可能会导致多设备系统中的安全漏洞。传统上，我们通过调查物联网开发商和专家来调查物联网问题。但是，该技术是不可扩展的，因为对所有物联网开发人员进行调查是不可行的。研究物联网问题的另一种方法是在主要在线开发论坛（如Stack Overflow（So））上查看IoT开发人员的讨论。但是，找到与物联网问题相关的讨论是具有挑战性的，因为它们经常不按与IOT相关的术语进行分类。在本文中，我们介绍了“ IoT安全数据集”，这是一个针对7147个样本的特定领域数据集，仅集中在IoT安全讨论上。由于没有自动化工具可以标记这些样品，因此我们将其标记为它们。我们进一步采用了多个变压器模型来自动检测安全讨论。通过严格的调查，我们发现物联网安全讨论与传统的安全讨论不同，更复杂。当我们从通用数据集“ Opiner”传输知识时，我们证明了跨域数据集上的变压器模型的性能损失（多达44％），并支持我们的主张。因此，我们构建了一个特定于域的IoT安全检测器，F1得分为0.69。我们已经公开了数据集，希望开发人员能够了解有关安全性讨论的更多信息，并且供应商将加强他们对产品安全的担忧。

The Internet of Things (IoT) is an emerging concept that directly links to the billions of physical items, or "things", that are connected to the Internet and are all gathering and exchanging information between devices and systems. However, IoT devices were not built with security in mind, which might lead to security vulnerabilities in a multi-device system. Traditionally, we investigated IoT issues by polling IoT developers and specialists. This technique, however, is not scalable since surveying all IoT developers is not feasible. Another way to look into IoT issues is to look at IoT developer discussions on major online development forums like Stack Overflow (SO). However, finding discussions that are relevant to IoT issues is challenging since they are frequently not categorized with IoT-related terms. In this paper, we present the "IoT Security Dataset", a domain-specific dataset of 7147 samples focused solely on IoT security discussions. As there are no automated tools to label these samples, we manually labeled them. We further employed multiple transformer models to automatically detect security discussions. Through rigorous investigations, we found that IoT security discussions are different and more complex than traditional security discussions. We demonstrated a considerable performance loss (up to 44%) of transformer models on cross-domain datasets when we transferred knowledge from a general-purpose dataset "Opiner", supporting our claim. Thus, we built a domain-specific IoT security detector with an F1-Score of 0.69. We have made the dataset public in the hope that developers would learn more about the security discussion and vendors would enhance their concerns about product security.

下载PDF全文

下载文献需遵守相关版权规定

论文标题