清晰度如何最小化最小化的清晰度？

论文标题

清晰度如何最小化最小化的清晰度？

How Does Sharpness-Aware Minimization Minimize Sharpness?

论文作者

Wen, Kaiyue, Ma, Tengyu, Li, Zhiyuan

论文摘要

清晰度感知最小化（SAM）是一种高效的正则化技术，用于改善各种环境中深神经网络的概括。但是，由于理论特征中各种有趣的近似值，SAM的基本工作仍然难以捉摸。 SAM打算惩罚模型清晰度的概念，但实现了计算有效的变体。此外，使用第三个清晰度概念来证明泛化保证。这些清晰度概念的微妙差异确实可以导致显着不同的经验结果。本文严格地钉住了SAM正规化并阐明基本机制的确切清晰度概念。我们还表明，SAM的原始动机中的两个步骤单独导致本地结论不准确，但是当应用全较大梯度时，它们的组合意外揭示了正确的效果。此外，我们还证明了SAM的随机版本实际上是在上面提到的第三个清晰度概念的规则，这最有可能是实践表现的首选概念。这种有趣现象背后的关键机制是应用SAM时梯度和最高特征向量之间的对齐。

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.

下载PDF全文

下载文献需遵守相关版权规定

论文标题