论文标题
正方形损耗和正交输入的浅层恢复网络的梯度流动动力学
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
论文作者
论文摘要
通过梯度下降方法对神经网络进行培训是深度学习革命的基石。然而,尽管最近有了一些进展,但解释其成功的完整理论仍然缺失。对于正交输入向量,本文介绍了训练一层隐藏层的梯度流动动力学的精确描述,即在较小的初始化时均方根误差。在这种情况下,尽管非跨性别性,我们表明梯度流趋于零损耗,并表征其对最小变化规范的隐式偏差。此外,强调了一些有趣的现象:对初始对齐现象的定量描述,以及该过程遵循特定的马鞍到马鞍动力学的证据。
The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.