论文标题
稳健性和多样性寻求无数据知识蒸馏
Robustness and Diversity Seeking Data-Free Knowledge Distillation
论文作者
论文摘要
知识蒸馏(KD)在模型压缩和知识转移方面取得了显着进步。但是,KD需要大量的原始数据或其代表统计信息通常在实践中不可用。最近提出了无数据的KD来解决此问题,在该问题中,教师和学生模型由接受老师培训的合成样本发电机喂养。尽管如此,现有的无数据KD方法依赖于权重的微调来平衡多个损失,而忽略了生成的样品的多样性,从而导致精度和鲁棒性有限。为了克服这一挑战,我们建议在本文中寻求无数据KD(RDSKD)的鲁棒性和多样性。发电机损耗函数的制作是为了生产具有高真实性,类别多样性和样本间多样性的样本。没有真实数据,寻求高样本真实性和阶级多样性的目标通常相互冲突,从而导致频繁的损失波动。我们通过指数惩罚损失增量来减轻这种情况。使用MNIST,CIFAR-10和SVHN数据集,我们的实验表明,与其他无数据的KD方法(例如DAFL,MSKD,ZSKD和DeepInversion)相比,RDSKD在不同的高参数设置上具有更高的精度,并且在不同的超参数设置上具有更高的鲁棒性。
Knowledge distillation (KD) has enabled remarkable progress in model compression and knowledge transfer. However, KD requires a large volume of original data or their representation statistics that are not usually available in practice. Data-free KD has recently been proposed to resolve this problem, wherein teacher and student models are fed by a synthetic sample generator trained from the teacher. Nonetheless, existing data-free KD methods rely on fine-tuning of weights to balance multiple losses, and ignore the diversity of generated samples, resulting in limited accuracy and robustness. To overcome this challenge, we propose robustness and diversity seeking data-free KD (RDSKD) in this paper. The generator loss function is crafted to produce samples with high authenticity, class diversity, and inter-sample diversity. Without real data, the objectives of seeking high sample authenticity and class diversity often conflict with each other, causing frequent loss fluctuations. We mitigate this by exponentially penalizing loss increments. With MNIST, CIFAR-10, and SVHN datasets, our experiments show that RDSKD achieves higher accuracy with more robustness over different hyperparameter settings, compared to other data-free KD methods such as DAFL, MSKD, ZSKD, and DeepInversion.