当表达性达到可训练性时：少于$ n $神经元可以工作

论文标题

当表达性达到可训练性时：少于$ n $神经元可以工作

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

论文作者

Zhang, Jiawei, Zhang, Yushun, Hong, Mingyi, Sun, Ruoyu, Luo, Zhi-Quan

论文摘要

现代神经网络通常很广泛，导致记忆力和计算成本较大。因此，训练一个较窄的网络是极大的兴趣。但是，训练狭窄的神经网仍然是一项具有挑战性的任务。我们问两个理论问题：狭窄的网络能否像广泛的网络具有强烈的表现力吗？如果是这样，损失函数是否表现出良性优化景观？在这项工作中，我们为1个隐藏层网络提供了部分肯定的答案，该网络在激活平稳时，$ n $（样本量）神经元少于$ n $（样本尺寸）。首先，我们证明，只要宽度$ m \ geq 2n/d $（其中$ d $是输入维度），它的表现力就很强，即至少存在一个全球最小化器，并且培训损失为零。其次，我们确定一个没有本地部分或马鞍点的不错的局部区域。然而，尚不清楚梯度下降是否可以留在这个不错的地区。第三，我们考虑可行区域是一个不错的局部区域的约束优化公式，并证明每个KKT点几乎都是全球最小化器。预计预计的梯度方法在轻度的技术条件下会融合到KKT点，但我们将严格的合并分析与未来的工作相吻合。彻底的数值结果表明，这种约束配方的预计梯度方法在训练狭窄的神经网中的表现明显优于SGD。

Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth. First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice region. Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. It is expected that projected gradient methods converge to KKT points under mild technical conditions, but we leave the rigorous convergence analysis to future work. Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题