论文标题
过度参数化神经网络的关键基因座
The critical locus of overparameterized neural networks
论文作者
论文摘要
深度学习中损失功能几何功能的许多方面仍然神秘。在本文中,我们致力于更好地理解损失功能的几何形状$ l $过度参数化的前馈神经网络。在这种情况下,我们确定了$ L $的关键基因座的几个组成部分,并研究其几何特性。对于深度$ \ ell \ geq 4 $的网络,我们确定了一个关键点的基因座,我们称之为“星座基因座” $ s $。在$ s $中,我们在$ p \ in c $,$ p $的$ p \中确定了一个正维的sublocus $ c $,这是一个退化的关键点,并且没有现有的理论结果保证梯度下降不会收敛到$ p $。对于非常广泛的网络,我们以较早的工作为基础,并表明$ L $的所有关键点都是退化的,并且在每个关键点都对Hessian的零特征值的零数量给出了下限。对于既深厚又广泛的网络,我们比较了我们所识别的所有不同关键点的Hessian零特征空间的增长率。本文的结果为起点提供了对$ l $关键基因座的各个组成部分的性质的更定量理解。
Many aspects of the geometry of loss functions in deep learning remain mysterious. In this paper, we work toward a better understanding of the geometry of the loss function $L$ of overparameterized feedforward neural networks. In this setting, we identify several components of the critical locus of $L$ and study their geometric properties. For networks of depth $\ell \geq 4$, we identify a locus of critical points we call the star locus $S$. Within $S$ we identify a positive-dimensional sublocus $C$ with the property that for $p \in C$, $p$ is a degenerate critical point, and no existing theoretical result guarantees that gradient descent will not converge to $p$. For very wide networks, we build on earlier work and show that all critical points of $L$ are degenerate, and give lower bounds on the number of zero eigenvalues of the Hessian at each critical point. For networks that are both deep and very wide, we compare the growth rates of the zero eigenspaces of the Hessian at all the different families of critical points that we identify. The results in this paper provide a starting point to a more quantitative understanding of the properties of various components of the critical locus of $L$.