正方形损耗和正交输入的浅层恢复网络的梯度流动动力学

论文标题

正方形损耗和正交输入的浅层恢复网络的梯度流动动力学

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

论文作者

Boursier, Etienne, Pillaud-Vivien, Loucas, Flammarion, Nicolas

论文摘要

通过梯度下降方法对神经网络进行培训是深度学习革命的基石。然而，尽管最近有了一些进展，但解释其成功的完整理论仍然缺失。对于正交输入向量，本文介绍了训练一层隐藏层的梯度流动动力学的精确描述，即在较小的初始化时均方根误差。在这种情况下，尽管非跨性别性，我们表明梯度流趋于零损耗，并表征其对最小变化规范的隐式偏差。此外，强调了一些有趣的现象：对初始对齐现象的定量描述，以及该过程遵循特定的马鞍到马鞍动力学的证据。

The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题