仔细研究学到的优化：稳定性，鲁棒性和电感偏见

论文标题

仔细研究学到的优化：稳定性，鲁棒性和电感偏见

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

论文作者

Harrison, James, Metz, Luke, Sohl-Dickstein, Jascha

论文摘要

学识渊博的优化器 - 经过训练可以充当优化器的神经网络 - 有可能大大加速机器学习模型的培训。但是，即使以巨大的计算费用跨越数千个任务进行元训练，当应用于元训练集中的任务时，BlackBox学习的优化者也经常会在稳定性和概括方面挣扎。在本文中，我们使用来自动力学系统的工具来研究优化算法的电感偏差和稳定性，并将所得的见解应用于设计黑框优化器的电感偏见。我们的调查始于嘈杂的二次模型，我们根据训练动力学的特征值来表征优化稳定的条件。然后，我们将简单的修改引入了学到的优化器的体系结构和元训练过程，从而改善了稳定性，并改善了优化器的电感偏差。我们将最终的学习优化器应用于各种神经网络训练任务，在该任务上，它在优化性能和元训练速度方面优于最新的学习优化器的当前状态 - 在匹配的优化器计算上的开销中，并且能够对任务的概括与与元训练的任务有很大不同。

Learned optimizers -- neural networks that are trained to act as optimizers -- have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across thousands of tasks at huge computational expense, blackbox learned optimizers often struggle with stability and generalization when applied to tasks unlike those in their meta-training set. In this paper, we use tools from dynamical systems to investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackbox optimizers. Our investigation begins with a noisy quadratic model, where we characterize conditions in which optimization is stable, in terms of eigenvalues of the training dynamics. We then introduce simple modifications to a learned optimizer's architecture and meta-training procedure which lead to improved stability, and improve the optimizer's inductive bias. We apply the resulting learned optimizer to a variety of neural network training tasks, where it outperforms the current state of the art learned optimizer -- at matched optimizer computational overhead -- with regard to optimization performance and meta-training speed, and is capable of generalization to tasks far different from those it was meta-trained on.

下载PDF全文

下载文献需遵守相关版权规定

论文标题