论文标题

设备域的概括

On-Device Domain Generalization

论文作者

Zhou, Kaiyang, Zhang, Yuanhan, Zang, Yuhang, Yang, Jingkang, Loy, Chen Change, Liu, Ziwei

论文摘要

我们提出了针对微小神经网络的域概括(DG)的系统研究。这个问题对于机上机器学习应用至关重要,但是在研究仅关注大型模型的文献中被忽略了。微小的神经网络具有较少的参数和较低的复杂性,因此不应以与大型DG应用相同的方式进行训练。通过进行广泛的实验,我们发现知识蒸馏(KD)是一种众所周知的模型压缩技术,比传统的DG方法更好地解决了在设备DG问题上。另一个有趣的观察结果是,在分发数据中的教师学生差距大于分布数据的差距,这突出了能力不匹配的问题以及KD的缺点。我们进一步提出了一种称为分布外知识蒸馏(OKD)的方法,该方法是教师如何通过破坏性数据增加来教师如何处理分布数据的数据。如果不向模型添加任何额外的参数(因此保持部署成本保持不变),OKD可显着改善各种用于图像和语音应用程序的expice DG方案中微型神经网络的DG性能。我们还为合成视觉域移位的可扩展方法以及一套新的DG数据集提供了补充现有的测试床。

We present a systematic study of domain generalization (DG) for tiny neural networks. This problem is critical to on-device machine learning applications but has been overlooked in the literature where research has been merely focused on large models. Tiny neural networks have much fewer parameters and lower complexity and therefore should not be trained the same way as their large counterparts for DG applications. By conducting extensive experiments, we find that knowledge distillation (KD), a well-known technique for model compression, is much better for tackling the on-device DG problem than conventional DG methods. Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD. We further propose a method called out-of-distribution knowledge distillation (OKD) where the idea is to teach the student how the teacher handles out-of-distribution data synthesized via disruptive data augmentation. Without adding any extra parameter to the model -- hence keeping the deployment cost unchanged -- OKD significantly improves DG performance for tiny neural networks in a variety of on-device DG scenarios for image and speech applications. We also contribute a scalable approach for synthesizing visual domain shifts, along with a new suite of DG datasets to complement existing testbeds.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源