论文标题
通过量化知识来解释知识蒸馏
Explaining Knowledge Distillation by Quantifying the Knowledge
论文作者
论文摘要
本文提出了一种通过量化和分析与任务相关的和任务 - 无关的视觉概念来解释知识蒸馏成功的方法,这些视觉概念编码在深神经网络(DNN)的中间层中。更具体地说,提出了三个假设,如下所示。 1。知识蒸馏使DNN学习更多的视觉概念,而不是从原始数据中学习。 2。知识蒸馏可确保DNN容易同时学习各种视觉概念。而在从原始数据中学习的情况下,DNN依次学习视觉概念。 3。知识蒸馏比从原始数据中学习更稳定的优化方向。因此,我们设计了三种类型的数学指标来评估DNN的特征表示。在实验中,我们诊断出各种DNN,并验证了上述假设。
This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN). More specifically, three hypotheses are proposed as follows. 1. Knowledge distillation makes the DNN learn more visual concepts than learning from raw data. 2. Knowledge distillation ensures that the DNN is prone to learning various visual concepts simultaneously. Whereas, in the scenario of learning from raw data, the DNN learns visual concepts sequentially. 3. Knowledge distillation yields more stable optimization directions than learning from raw data. Accordingly, we design three types of mathematical metrics to evaluate feature representations of the DNN. In experiments, we diagnosed various DNNs, and above hypotheses were verified.