EADAM优化器：$ε$影响亚当

论文标题

EADAM优化器：$ε$影响亚当

EAdam Optimizer: How $ε$ Impact Adam

论文作者

Yuan, Wei, Gao, Kai-Xin

论文摘要

许多自适应优化方法已被提出并用于深度学习，其中亚当被认为是默认算法，并在许多深度学习框架中广泛使用。最近，已经提出了许多亚当的变体，例如Adabound，Radam和Anabelief，并且表现出比Adam更好的表现。但是，这些变体主要集中于通过在梯度或正方形上差异来改变步骤大小。由于合适的阻尼对于功能强大的二阶优化器的成功很重要的事实，我们在本文中讨论了$ε$对亚当的影响。令人惊讶的是，我们可以获得比Adam更能获得更好的性能。基于这一发现，我们提出了一个名为EADAM的新变体，该变体不需要额外的超参数或计算成本。我们还讨论了我们的方法与亚当之间的关系和差异。最后，我们对各种流行任务和模型进行了广泛的实验。实验结果表明，与亚当相比，我们的方法可以带来显着改善。我们的代码可从https://github.com/yuanwei2019/eadam-optimizer获得。

Many adaptive optimization methods have been proposed and used in deep learning, in which Adam is regarded as the default algorithm and widely used in many deep learning frameworks. Recently, many variants of Adam, such as Adabound, RAdam and Adabelief, have been proposed and show better performance than Adam. However, these variants mainly focus on changing the stepsize by making differences on the gradient or the square of it. Motivated by the fact that suitable damping is important for the success of powerful second-order optimizers, we discuss the impact of the constant $ε$ for Adam in this paper. Surprisingly, we can obtain better performance than Adam simply changing the position of $ε$. Based on this finding, we propose a new variant of Adam called EAdam, which doesn't need extra hyper-parameters or computational costs. We also discuss the relationships and differences between our method and Adam. Finally, we conduct extensive experiments on various popular tasks and models. Experimental results show that our method can bring significant improvement compared with Adam. Our code is available at https://github.com/yuanwei2019/EAdam-optimizer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题