图形神经网络上的在线跨层知识蒸馏，并深入监督

论文标题

图形神经网络上的在线跨层知识蒸馏，并深入监督

Online Cross-Layer Knowledge Distillation on Graph Neural Networks with Deep Supervision

论文作者

Guo, Jiongyu, Chen, Defang, Wang, Can

论文摘要

图形神经网络（GNNS）已成为学术界和行业社区中最受欢迎的研究主题之一，因为它们可以很好地处理不规则的图形数据。但是，大型数据集对在资源有限的边缘设备中部署GNN构成了巨大挑战，并且模型压缩技术引起了相当大的研究关注。现有的模型压缩技术，例如知识蒸馏（KD），主要关注卷积神经网络（CNN）。最近仅尝试以离线方式从GNN中提取知识的尝试有限。由于教师模型的表现不一定会随着GNN的层数增加而改善，因此选择合适的教师模型将需要大量的努力。为了应对这些挑战，我们在本文中提出了一个新颖的在线知识蒸馏框架，称为Alignahead ++。 Alignahead ++在学生层中将结构和特征信息传输到另一个经过培训的培训程序中的另一个经过训练的学生模型的上一层。同时，为避免GNN中的过度平滑问题，通过在每个中间层中添加辅助分类器来防止节点特征嵌入的崩溃，从而在Alignahead ++中采用了深层监督。在包括PPI，Cora，PubMed和Citeseer在内的四个数据集上的实验结果表明，在我们的协作培训框架中，没有预先训练的教师模型的监督，学生的表现会始终如一地提高，并且通常可以通过增加学生人数来提高其有效性。

Graph neural networks (GNNs) have become one of the most popular research topics in both academia and industry communities for their strong ability in handling irregular graph data. However, large-scale datasets are posing great challenges for deploying GNNs in edge devices with limited resources and model compression techniques have drawn considerable research attention. Existing model compression techniques such as knowledge distillation (KD) mainly focus on convolutional neural networks (CNNs). Only limited attempts have been made recently for distilling knowledge from GNNs in an offline manner. As the performance of the teacher model does not necessarily improve as the number of layers increases in GNNs, selecting an appropriate teacher model will require substantial efforts. To address these challenges, we propose a novel online knowledge distillation framework called Alignahead++ in this paper. Alignahead++ transfers structure and feature information in a student layer to the previous layer of another simultaneously trained student model in an alternating training procedure. Meanwhile, to avoid over-smoothing problem in GNNs, deep supervision is employed in Alignahead++ by adding an auxiliary classifier in each intermediate layer to prevent the collapse of the node feature embeddings. Experimental results on four datasets including PPI, Cora, PubMed and CiteSeer demonstrate that the student performance is consistently boosted in our collaborative training framework without the supervision of a pre-trained teacher model and its effectiveness can generally be improved by increasing the number of students.

下载PDF全文

下载文献需遵守相关版权规定

论文标题