论文标题
图神经网络中过度厚度的非反应分析
A Non-Asymptotic Analysis of Oversmoothing in Graph Neural Networks
论文作者
论文摘要
过度尺寸是建立更强大的图形神经网络(GNNS)的核心挑战。尽管以前的作品仅证明,当图形卷积的数量趋于无限时,不可避免地要过度厚,但在本文中,我们精确地表征了现象背后的机制,即通过非反应分析。具体而言,我们在应用图卷积时区分了两种不同的效果 - 不良的混合效应,使不同类别中的节点表示均质,以及可取的脱氧效果,使同一类中的节点表示均匀。通过量化从上下文随机块模型(CSBM)采样的随机图上的这两种效果,我们表明,一旦混合效应开始主导deNOSIS效果,就会表明,这种过渡所需的层数为$ O(\ log n/\ log(\ log n)),以充分使用$ n $ n $ nodess。我们还扩展了分析,以研究个性化Pagerank(PPR)的效果,或同等的,初始残留连接对过度厚度的影响。我们的结果表明,虽然PPR在更深的层中减轻过度厚度,但基于PPR的体系结构仍然可以在较浅的深度上实现最佳性能,并且在某些图上的图形卷积方法表现出色。最后,我们通过数值实验来支持我们的理论结果,这进一步表明,在实践中观察到的过度厚度现象可以通过优化深度GNN模型的困难来放大。
Oversmoothing is a central challenge of building more powerful Graph Neural Networks (GNNs). While previous works have only demonstrated that oversmoothing is inevitable when the number of graph convolutions tends to infinity, in this paper, we precisely characterize the mechanism behind the phenomenon via a non-asymptotic analysis. Specifically, we distinguish between two different effects when applying graph convolutions -- an undesirable mixing effect that homogenizes node representations in different classes, and a desirable denoising effect that homogenizes node representations in the same class. By quantifying these two effects on random graphs sampled from the Contextual Stochastic Block Model (CSBM), we show that oversmoothing happens once the mixing effect starts to dominate the denoising effect, and the number of layers required for this transition is $O(\log N/\log (\log N))$ for sufficiently dense graphs with $N$ nodes. We also extend our analysis to study the effects of Personalized PageRank (PPR), or equivalently, the effects of initial residual connections on oversmoothing. Our results suggest that while PPR mitigates oversmoothing at deeper layers, PPR-based architectures still achieve their best performance at a shallow depth and are outperformed by the graph convolution approach on certain graphs. Finally, we support our theoretical results with numerical experiments, which further suggest that the oversmoothing phenomenon observed in practice can be magnified by the difficulty of optimizing deep GNN models.