论文标题
动量对抗蒸馏:处理无数据知识蒸馏中的巨大分配变化
Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation
论文作者
论文摘要
无数据知识蒸馏(DFKD)最近引起了人们的关注,这要归功于其在不使用培训数据的情况下将知识从教师网络转移到学生网络的能力。主要思想是使用发电机合成数据以培训学生。随着发电机的更新,合成数据的分布将发生变化。如果发电机和学生接受对手的训练,这会导致学生忘记上一步获得的知识,则这种分配变化可能会很大。为了减轻这个问题,我们提出了一种简单而有效的方法,称为动量对抗蒸馏(MAD),该方法维持了发电机的指数移动平均值(EMA)副本,并使用发电机和EMA生成器的合成样本来培训学生。由于EMA发电机可以被视为发电机旧版本的合奏,并且与发电机相比,更新的变化经常较小,因此对其合成样本进行培训可以帮助学生回忆起过去的知识,并防止学生对生成器的新更新太快。我们在六个基准数据集上进行的实验,包括ImageNet和Place365等大数据集,证明了MAD的出色性能胜过处理大型分配转移问题的竞争方法。在某些情况下,我们的方法还与现有的DFKD方法相比,甚至可以实现最新的方法。
Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be large if the generator and the student are trained adversarially, causing the student to forget the knowledge it acquired at previous steps. To alleviate this problem, we propose a simple yet effective method called Momentum Adversarial Distillation (MAD) which maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Since the EMA generator can be considered as an ensemble of the generator's old versions and often undergoes a smaller change in updates compared to the generator, training on its synthetic samples can help the student recall the past knowledge and prevent the student from adapting too quickly to new updates of the generator. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases.