蒸馏出不可证实的：向一位讨厌的老师学习

论文标题

蒸馏出不可证实的：向一位讨厌的老师学习

Distilling the Undistillable: Learning from a Nasty Teacher

论文作者

Jandial, Surgan, Khasbage, Yash, Pal, Arghya, Balasubramanian, Vineeth N, Krishnamurthy, Balaji

论文摘要

最近，使用知识蒸馏（KD）窃取了私人/敏感信息的无意窃取，最近引导了随后考虑其批判性质的辩护工作。最近的工作令人讨厌的老师建议培养无法通过攻击模型来蒸馏或模仿的教师。但是，讨厌的老师提供的机密承诺并没有得到很好的研究，作为加强这种漏洞的进一步步骤，我们试图成功地绕开其防御和窃取（或提取）信息。具体而言，我们从两个不同的方向分析了讨厌的老师，然后仔细利用它们来开发简单而有效的方法，称为HTC和SCM，这使从讨厌的教师的学习中的学习在标准数据集中最高高达68.63％。此外，我们还根据我们的偷窃见解探索即兴防御方法。我们对不同模型/设置的详细实验和消融表明了我们方法的功效。

The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题