使用神经切线内核分析图形神经网络中的非线性和深度

论文标题

使用神经切线内核分析图形神经网络中的非线性和深度

Analysis of Convolutions, Non-linearity and Depth in Graph Neural Networks using Neural Tangent Kernel

论文作者

Sabanayagam, Mahalakshmi, Esser, Pascal, Ghoshdastidar, Debarghya

论文摘要

图形神经网络（GNN）的基本原理是通过使用“图形卷积”与网络体系结构（例如深度和激活函数）共同汇总相邻节点来利用数据的结构信息。因此，了解每个设计选择对网络性能的影响至关重要。基于图形laplacian的卷积已经成为主要选择，将邻接矩阵的对称归一化作为最广泛采用的矩阵。但是，一些经验研究表明，邻接矩阵的行规范化在节点分类中优于它。尽管GNN广泛使用，但对于这些卷积的代表力，没有严格的理论研究可以解释这种行为。同样，对线性GNNS性能的经验观察与非线性Relu GNN相当，缺乏严格的理论。在这项工作中，我们理论上在半监督的节点分类设置中使用图神经切线内核分析了GNN体系结构的不同方面的影响。在人口程度校正的随机块模型下，我们证明：（i）线性网络捕获类与RELU网络一样好的类信息；（ii）行规范化比其他卷积更好地保留了基础阶级结构；（iii）由于过度光滑而导致的网络深度降解性能，但类信息的损失是行规范化最慢；（iv）跳过连接即使在无限的深度下也保留了类信息，从而消除了过度平滑的。我们最终以数值和实际数据集（例如Cora和Citeseer）的数字和实际数据集验证我们的理论发现。

The fundamental principle of Graph Neural Networks (GNNs) is to exploit the structural information of the data by aggregating the neighboring nodes using a `graph convolution' in conjunction with a suitable choice for the network architecture, such as depth and activation functions. Therefore, understanding the influence of each of the design choice on the network performance is crucial. Convolutions based on graph Laplacian have emerged as the dominant choice with the symmetric normalization of the adjacency matrix as the most widely adopted one. However, some empirical studies show that row normalization of the adjacency matrix outperforms it in node classification. Despite the widespread use of GNNs, there is no rigorous theoretical study on the representation power of these convolutions, that could explain this behavior. Similarly, the empirical observation of the linear GNNs performance being on par with non-linear ReLU GNNs lacks rigorous theory. In this work, we theoretically analyze the influence of different aspects of the GNN architecture using the Graph Neural Tangent Kernel in a semi-supervised node classification setting. Under the population Degree Corrected Stochastic Block Model, we prove that: (i) linear networks capture the class information as good as ReLU networks; (ii) row normalization preserves the underlying class structure better than other convolutions; (iii) performance degrades with network depth due to over-smoothing, but the loss in class information is the slowest in row normalization; (iv) skip connections retain the class information even at infinite depth, thereby eliminating over-smoothing. We finally validate our theoretical findings numerically and on real datasets such as Cora and Citeseer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题