了解标准化层的概括益处：降低清晰度

论文标题

了解标准化层的概括益处：降低清晰度

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

论文作者

Lyu, Kaifeng, Li, Zhiyuan, Arora, Sanjeev

论文摘要

引入了归一化层（例如，批处理归一化，层归一化），以帮助在非常深的网中遇到优化困难，但它们显然也有助于概括，即使在不太深入的网中也是如此。由于长期以来的信念，即最小值会导致更好的概括，本文提供了数学分析和支持实验，这表明归一化（与伴随的重量 - 赛车一起）鼓励GD降低损失表面的清晰度。鉴于损失是标准不变的，这是标准化的已知结果，因此仔细地定义了“清晰度”。具体而言，对于具有归一化的相当广泛的神经网络，我们的理论解释了有限学习率的GD如何进入所谓的稳定边缘（EOS）制度，并通过连续减少锐度 - 减少流量来表征GD的轨迹。

Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held belief that flatter minima lead to better generalization, this paper gives mathematical analysis and supporting experiments suggesting that normalization (together with accompanying weight-decay) encourages GD to reduce the sharpness of loss surface. Here "sharpness" is carefully defined given that the loss is scale-invariant, a known consequence of normalization. Specifically, for a fairly broad class of neural nets with normalization, our theory explains how GD with a finite learning rate enters the so-called Edge of Stability (EoS) regime, and characterizes the trajectory of GD in this regime via a continuous sharpness-reduction flow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题