论文标题
从优化学习非呈现概括界限
Learning Non-Vacuous Generalization Bounds from Optimization
论文作者
论文摘要
深度学习社区中的基本挑战之一是理论上了解深度神经网络对看不见的数据的概括程度。但是,当前的方法通常会产生过于宽松的概括范围,无法为真正的概括错误提供信息,或者仅适用于压缩网。在这项研究中,我们从优化的角度提出了一个简单而非易变的概括。我们通过利用随机梯度算法访问的假设集实质上是分形的,因此可以在算法依赖性的Rademacher复杂性上获得更严格的结合。主要参数取决于通过由小数布朗运动驱动的连续时间随机微分方程对离散时间递归过程进行建模。数值研究表明,即使在大规模数据集(例如Imagenet-1K)上训练,我们的方法也能够为现代神经网络(例如Resnet和Vision Transformer)产生合理的概括保证。
One of the fundamental challenges in the deep learning community is to theoretically understand how well a deep neural network generalizes to unseen data. However, current approaches often yield generalization bounds that are either too loose to be informative of the true generalization error or only valid to the compressed nets. In this study, we present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by stochastic gradient algorithms is essentially fractal-like and thus can derive a tighter bound over the algorithm-dependent Rademacher complexity. The main argument rests on modeling the discrete-time recursion process via a continuous-time stochastic differential equation driven by fractional Brownian motion. Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks such as ResNet and Vision Transformer, even when they are trained on a large-scale dataset (e.g. ImageNet-1K).