使用指定梯度更新的阶梯尺寸适应

论文标题

使用指定梯度更新的阶梯尺寸适应

Step-size Adaptation Using Exponentiated Gradient Updates

论文作者

Amid, Ehsan, Anil, Rohan, Fifty, Christopher, Warmuth, Manfred K.

论文摘要

像亚当（Adam）和亚当（Adagrad）这样的优化者在培训大规模神经网络方面非常成功。但是，这些方法的性能在很大程度上取决于精心调整的学习率计划。我们表明，在许多大规模应用中，通过自适应调谐方法增强给定优化器的阶梯尺寸可以大大提高性能。更确切地说，我们为更新维护一个全局的步进量表以及每个坐标的增益因子。我们根据平均梯度和当前梯度向量的比对来调整全球尺度。使用类似的方法来更新本地收益因子。这种类型的步进刻度调整以前是通过梯度下降更新进行的。在本文中，我们更新了带有指数梯度更新的阶梯刻度和增益变量。在实验上，我们表明我们的方法可以在不使用任何特殊调整的学习率计划的情况下实现标准模型上的引人注目的准确性。我们还显示了我们方法在培训过程中快速适应数据分配变化的有效性。

Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global step-size scale for the update as well as a gain factor for each coordinate. We adjust the global scale based on the alignment of the average gradient and the current gradient vectors. A similar approach is used for updating the local gain factors. This type of step-size scale tuning has been done before with gradient descent updates. In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead. Experimentally, we show that our approach can achieve compelling accuracy on standard models without using any specially tuned learning rate schedule. We also show the effectiveness of our approach for quickly adapting to distribution shifts in the data during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题