论文标题

二阶回归模型表现出逐步锐化到稳定的边缘

Second-order regression models exhibit progressive sharpening to the edge of stability

论文作者

Agarwala, Atish, Pedregosa, Fabian, Pennington, Jeffrey

论文摘要

对具有较大步骤的梯度下降的最新研究表明,通常存在一个损失的最大特征值Hessian(渐进锐化)的制度,然后稳定在最大值接近最大值的特征值,从而允许收敛(边缘的边缘)。这些现象在本质上是非线性的,对于恒定神经切线内核(NTK)方向的模型不会发生,对于参数中的预测函数大致是线性的。因此,我们考虑了下一个最简单的预测模型,即参数中二次的二等模型,我们称之为二阶回归模型。对于二维目标中的二次目标,我们证明,该二阶回归模型表现出对NTK特征值的逐步锐化,其值与稳定性边缘略有不同,我们明确地计算了稳定性。在较高的维度中,即使没有神经网络的特定结构,该模型也会显示出相似的行为,这表明逐步锐化和稳定性行为不是神经网络的独特特征,并且可能是高维非线性模型中离散学习算法的更一般属性。

Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren't unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源