Indistill：模型压缩的信息提供流动流动知识蒸馏

论文标题

Indistill：模型压缩的信息提供流动流动知识蒸馏

InDistill: Information flow-preserving knowledge distillation for model compression

论文作者

Sarridis, Ioannis, Koutlis, Christos, Kordopatis-Zilos, Giorgos, Kompatsiaris, Ioannis, Papadopoulos, Symeon

论文摘要

在本文中，我们介绍了Indistill，该方法是增强知识蒸馏（KD）有效性的热身阶段的方法。 Indistill专注于将关键信息流道从重量级老师转移到轻量级学生。这是通过基于课程学习的培训方案来实现的，该课程考虑了每一层的蒸馏难度以及建立信息流道路时的关键学习期。此过程可能会导致学生模型，该模型可以更好地向老师学习。为了确保Indistill在各种教师对之间的适用性，当教师和学生层的宽度差异时，我们还将进行修剪操作。这种修剪的操作减少了教师中间层的宽度以匹配学生的宽度，从而可以直接蒸馏而无需编码阶段。该方法使用CIFAR-10，CIFAR-100和Imagenet数据集上的各种教师架构对各对教师架构进行了广泛的评估，以证明保留信息流路径始终提高基线KD在分类和检索方面的性能。该代码可在https://github.com/gsarridis/indistill上找到。

In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved via a training scheme based on curriculum learning that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation without the need for an encoding stage. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrating that preserving the information flow paths consistently increases the performance of the baseline KD approaches on both classification and retrieval settings. The code is available at https://github.com/gsarridis/InDistill.

下载PDF全文

下载文献需遵守相关版权规定

论文标题