神经网络和多项式回归。揭开过度参数化现象

论文标题

神经网络和多项式回归。揭开过度参数化现象

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

论文作者

Emschwiller, Matt, Gamarnik, David, Kızıldağ, Eren C., Zadik, Ilias

论文摘要

在神经网络模型的背景下，过多参数是指这些模型在看不见的数据上似乎可以很好地概括的现象，即使参数的数量显着超过了样本量，并且该模型非常适合训练数据。对这种现象的常规解释是基于用于训练数据的算法的自我调节特性。在本文中，我们证明了一系列结果，这些结果提供了一些不同的解释。 Adopting a teacher/student model where the teacher network is used to generate the predictions and student network is trained on the observed labeled data, and then tested on out-of-sample data, we show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension and approximation guarantee alone, regardless of the number of internal nodes of either teacher or student network. 我们的主张是基于多项式（Tensor）回归模型近似教师和学生网络的基础，其程度仅取决于所需的准确性和网络深度。这样的参数化明显不取决于内部节点的数量。因此，我们的结果暗示的消息是，通过隐藏节点的数量来参数化广泛的神经网络是误导性的，而参数化复杂性的更合适的度量是与张开数据相关的回归系数的数量。特别是，这在某种程度上阐明了神经网络具有更经典的数据复杂性和泛化界限的概括能力。我们对MNIST和时尚流行数据集的经验结果确实证实，即使张量的程度最多是两个，也可以证实，调度的回归可以实现良好的样本外部性能。

In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. In this paper we prove a series of results which provide a somewhat diverging explanation. Adopting a teacher/student model where the teacher network is used to generate the predictions and student network is trained on the observed labeled data, and then tested on out-of-sample data, we show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension and approximation guarantee alone, regardless of the number of internal nodes of either teacher or student network. Our claim is based on approximating both teacher and student networks by polynomial (tensor) regression models with degree depending on the desired accuracy and network depth only. Such a parametrization notably does not depend on the number of internal nodes. Thus a message implied by our results is that parametrizing wide neural networks by the number of hidden nodes is misleading, and a more fitting measure of parametrization complexity is the number of regression coefficients associated with tensorized data. In particular, this somewhat reconciles the generalization ability of neural networks with more classical statistical notions of data complexity and generalization bounds. Our empirical results on MNIST and Fashion-MNIST datasets indeed confirm that tensorized regression achieves a good out-of-sample performance, even when the degree of the tensor is at most two.

下载PDF全文

下载文献需遵守相关版权规定

论文标题