将周期性学习率应用于神经机器翻译

论文标题

将周期性学习率应用于神经机器翻译

Applying Cyclical Learning Rate to Neural Machine Translation

论文作者

Lee, Choon Meng, Liu, Jianfeng, Peng, Wei

论文摘要

在培训深度学习网络中，通常不考虑的优化器和相关的学习率通常会使用太多或最少的调整，即使这对于确保快速收敛至良好质量的损失功能至关重要，这也可以在测试数据集中良好地概述。我们从与计算机视觉相关的卷积网络和数据集的周期性学习率策略成功应用中汲取灵感，我们探讨了如何将周期性学习率应用于训练基于变形金刚的神经网络进行神经机器翻译。从我们精心设计的实验中，我们表明，优化者的选择和相关的周期性学习率策略可能会对性能产生重大影响。此外，我们在将周期性学习率应用于神经机器翻译任务时建立准则。因此，在我们的工作中，我们希望提高人们对选择正确的优化者和随附的学习率政策的重要性的认识，同时，鼓励进一步研究易于使用的学习率政策。

In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题