论文标题
通过使用敏感样品的无损黑盒水印来验证深层模型的完整性
Verifying Integrity of Deep Ensemble Models by Lossless Black-box Watermarking with Sensitive Samples
论文作者
论文摘要
随着在许多领域广泛使用深神经网络(DNN),越来越多的研究重点是保护DNN模型免受知识产权(IP)侵权。许多现有方法应用数字水印来保护DNN模型。他们中的大多数要么将水印直接嵌入内部网络结构/参数中,要么通过微调一个模型,以用一组所谓的触发样品来保护零位水印。尽管这些方法非常有效,但它们是为单个DNN模型设计的,该模型不能直接应用于结合多个DNN模型以做出最终决定的深度集合模型(DEM)。它激发了我们在本文中提出一种新型的黑盒水印方法,以验证DEM的完整性。在提出的方法中,通过模仿现实世界的DEM攻击并分析了未攻击的DEM的子模型的预测结果,并在精心设计的数据集中仔细分析了一定数量的敏感样本。通过在这些精心设计的敏感样本上分析目标DEM的预测结果,我们可以验证目标DEM的完整性。与许多以前的方法不同,所提出的方法不会修改要保护的原始DEM,这表明所提出的方法是无损的。实验结果表明,即使只攻击了一个子模型,在实践中具有良好潜力,DEM完整性也可以可靠地验证。
With the widespread use of deep neural networks (DNNs) in many areas, more and more studies focus on protecting DNN models from intellectual property (IP) infringement. Many existing methods apply digital watermarking to protect the DNN models. The majority of them either embed a watermark directly into the internal network structure/parameters or insert a zero-bit watermark by fine-tuning a model to be protected with a set of so-called trigger samples. Though these methods work very well, they were designed for individual DNN models, which cannot be directly applied to deep ensemble models (DEMs) that combine multiple DNN models to make the final decision. It motivates us to propose a novel black-box watermarking method in this paper for DEMs, which can be used for verifying the integrity of DEMs. In the proposed method, a certain number of sensitive samples are carefully selected through mimicking real-world DEM attacks and analyzing the prediction results of the sub-models of the non-attacked DEM and the attacked DEM on the carefully crafted dataset. By analyzing the prediction results of the target DEM on these carefully crafted sensitive samples, we are able to verify the integrity of the target DEM. Different from many previous methods, the proposed method does not modify the original DEM to be protected, which indicates that the proposed method is lossless. Experimental results have shown that the DEM integrity can be reliably verified even if only one sub-model was attacked, which has good potential in practice.