论文标题
低级矩阵恢复的平面最小值
Flat minima generalize for low-rank matrix recovery
论文作者
论文摘要
经验证据表明,对于各种过度参数化的非线性模型,最著名的是在神经网络训练中,最小化器周围损失的增长会对其性能产生重大影响。平坦的最小值 - 损失越来越缓慢的人 - 似乎概括了。这项工作通过专注于最简单的过度参数非线性模型来迈向理解这种现象的一步:在低级别矩阵恢复中产生的模型。我们分析了具有二次激活功能的单个隐性矩阵,强大的PCA,稳健的PCA,协方差矩阵估计以及单个隐藏层神经网络。在所有情况下,我们都表明,按照标准统计假设,通过Hessian的痕迹衡量的平坦最小值。对于矩阵完成,我们确定了较弱的恢复,尽管经验证据也表明这里也有精确的恢复。我们以合成实验来结论,以说明我们的发现并讨论深度对平坦溶液的影响。
Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We conclude with synthetic experiments that illustrate our findings and discuss the effect of depth on flat solutions.