梯度下降的隐式偏差在广义的封式线性网络上

论文标题

梯度下降的隐式偏差在广义的封式线性网络上

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

论文作者

Lippl, Samuel, Abbott, L. F., Chung, SueYeon

论文摘要

了解深神经网络梯度训练的渐近行为对于揭示电感偏见和改善网络性能至关重要。我们得出了数学上可拖延的深度非线性神经网络，封闭线性网络（GLN）的无限时间训练极限，并将这些结果推广到一般均质多项式描述的封闭网络。我们研究结果的含义，首先关注两层GLN。然后，我们将理论预测应用于接受MNIST训练的GLN，并展示梯度下降的建筑约束和隐式偏见影响性能。最后，我们表明我们的理论捕获了Relu网络的归纳偏见的很大一部分。通过使归纳性偏见明确，我们的框架有望为更高效，更合理且健壮的学习算法的发展提供信息。

Understanding the asymptotic behavior of gradient-descent training of deep neural networks is essential for revealing inductive biases and improving network performance. We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks, gated linear networks (GLNs), and generalize these results to gated networks described by general homogeneous polynomials. We study the implications of our results, focusing first on two-layer GLNs. We then apply our theoretical predictions to GLNs trained on MNIST and show how architectural constraints and the implicit bias of gradient descent affect performance. Finally, we show that our theory captures a substantial portion of the inductive bias of ReLU networks. By making the inductive bias explicit, our framework is poised to inform the development of more efficient, biologically plausible, and robust learning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题