论文标题

蜂蜜模型:机器学习蜜罐

HoneyModels: Machine Learning Honeypots

论文作者

Abdou, Ahmed, Sheatsley, Ryan, Beugin, Yohan, Shipp, Tyler, McDaniel, Patrick

论文摘要

机器学习已成为当今许多系统的关键方面,在分类和预测任务上提供了新发现的性能,但是这种快速集成也带来了新的不可预见的漏洞。为了硬化这些系统,对抗机器学习的不断增长的领域提出了新的攻击和防御机制。但是,存在极大的不对称性,因为这些防御方法只能为某些模型提供安全性,并且由于过度限制的约束而缺乏可扩展性,计算效率和实用性。此外,新引入的攻击可以通过进行微妙的改变轻松绕过防御策略。在本文中,我们研究了一种受蜜罐启发的替代方法,以检测对手。我们的方法用嵌入式水印产生了学到的模型。当对手与我们的模型启动相互作用时,鼓励攻击添加这种预定的水印刺激对抗性实例的检测。我们表明,蜂蜜模型可以揭示69.5%的对手试图攻击神经网络,同时保留模型的原始功能。 Honeymodels提供了一个替代方向来确保机器学习,从而略微影响了准确性,同时鼓励蜂蜜模型可检测到的水印的对抗样品,但与其他人无法区分。

Machine Learning is becoming a pivotal aspect of many systems today, offering newfound performance on classification and prediction tasks, but this rapid integration also comes with new unforeseen vulnerabilities. To harden these systems the ever-growing field of Adversarial Machine Learning has proposed new attack and defense mechanisms. However, a great asymmetry exists as these defensive methods can only provide security to certain models and lack scalability, computational efficiency, and practicality due to overly restrictive constraints. Moreover, newly introduced attacks can easily bypass defensive strategies by making subtle alterations. In this paper, we study an alternate approach inspired by honeypots to detect adversaries. Our approach yields learned models with an embedded watermark. When an adversary initiates an interaction with our model, attacks are encouraged to add this predetermined watermark stimulating detection of adversarial examples. We show that HoneyModels can reveal 69.5% of adversaries attempting to attack a Neural Network while preserving the original functionality of the model. HoneyModels offer an alternate direction to secure Machine Learning that slightly affects the accuracy while encouraging the creation of watermarked adversarial samples detectable by the HoneyModel but indistinguishable from others for the adversary.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源