论文标题
关于大型非线性模型的线性性:何时以及为什么切线核为恒定
On the linearity of large non-linear models: when and why the tangent kernel is constant
论文作者
论文摘要
这项工作的目的是阐明在其宽度接近无限时过渡到某些神经网络线性的显着现象。我们表明,(神经)切线内核(NTK)的恒定性的过渡是由于网络矩阵的标准属性而导致的(NTK),这是网络宽度的函数。我们提出了一个通用框架,以通过适用于神经网络标准类别的Hessian缩放来理解切线内核的恒定框架。我们的分析提供了关于恒定切线内核现象的新观点,该观点与广泛接受的“懒惰训练”不同。此外,我们表明,向线性的过渡不是广泛的神经网络的一般属性,并且当网络的最后一层是非线性时,也不成立。也不需要通过梯度下降成功优化。
The goal of this work is to shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted "lazy training". Furthermore, we show that the transition to linearity is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear. It is also not necessary for successful optimization by gradient descent.