针对大型模型的原型引导的交叉任务知识蒸馏

论文标题

针对大型模型的原型引导的交叉任务知识蒸馏

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

论文作者

Li, Deng, Wu, Aming, Han, Yahong, Tian, Qi

论文摘要

最近，大规模的预训练模型在许多任务中都表明了它们的优势。但是，由于巨大的计算复杂性和存储要求，将大规模模型应用于真实场景是一项挑战。一个常见的解决方案是知识蒸馏，将大规模模型视为教师模型，并有助于培训小型学生模型以获得竞争性能。交叉任务知识蒸馏扩展了大规模预训练模型的应用程序方案。现有的知识蒸馏工作的重点是直接模仿最终预测或教师模型的中间层，这些预测代表了全球级别的特征，并且是特定于任务的。为了减轻不同标签空间的约束，捕获固有的固有物体特征（例如腿的形状特征和牛和马的形状特征）起着关键作用。考虑到真实场景任务的复杂性和可变性，我们提出了一种原型引导的交叉任务知识蒸馏（PROC-KD）方法，以将大型教师网络的内在本地级对象知识传输到各种任务方案。首先，为了更好地传输教师模型中的广义知识，我们提出了一个原型学习模块，以从教师模型中对象的基本特征表示中学习。其次，对于各种下游任务，我们提出了一个任务自适应功能增强模块，以通过学习的概括原型特征来增强学生模型的功能，并指导学生模型的培训以提高其泛化能力。各种视觉任务的实验结果证明了我们方法对大规模模型交叉任务知识蒸馏场景的有效性。

Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to the huge computational complexity and storage requirements, it is challenging to apply the large-scale model to real scenes. A common solution is knowledge distillation which regards the large-scale model as a teacher model and helps to train a small student model to obtain a competitive performance. Cross-task Knowledge distillation expands the application scenarios of the large-scale pre-trained model. Existing knowledge distillation works focus on directly mimicking the final prediction or the intermediate layers of the teacher model, which represent the global-level characteristics and are task-specific. To alleviate the constraint of different label spaces, capturing invariant intrinsic local object characteristics (such as the shape characteristics of the leg and tail of the cattle and horse) plays a key role. Considering the complexity and variability of real scene tasks, we propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios. First, to better transfer the generalized knowledge in the teacher model in cross-task scenarios, we propose a prototype learning module to learn from the essential feature representation of objects in the teacher model. Secondly, for diverse downstream tasks, we propose a task-adaptive feature augmentation module to enhance the features of the student model with the learned generalization prototype features and guide the training of the student model to improve its generalization ability. The experimental results on various visual tasks demonstrate the effectiveness of our approach for large-scale model cross-task knowledge distillation scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题