基于梯度的超参数优化在远距离上

论文标题

基于梯度的超参数优化在远距离上

Gradient-based Hyperparameter Optimization Over Long Horizons

论文作者

Micaelli, Paul, Storkey, Amos

论文摘要

基于梯度的超参数优化在很少的元学习的背景下赢得了广泛的流行，但由于记忆缩放和梯度降低问题，对于长期视野的任务（许多梯度步骤）仍然是不切实际的。一个常见的解决方法是在线学习超参数，但这引入了贪婪，这会带来显着的性能下降。我们提出了与共享（FDS）的前向模式分化，这是一种简单有效的算法，它通过共享及时连续的超级参数来解决内存缩放问题，并通过向前模式差异化以及梯度降低问题。我们提供了有关算法降低降噪属性的理论保证，并通过通过$ \ sim 10^4 $梯度的梯度步骤来证明其效率的效率。我们考虑了CIFAR-10上的大型超参数搜索范围，在该范围内，我们显着超过了基于贪婪的梯度的替代方案，而与最先进的黑盒方法相比，我们可以实现$ \ times 20 $ speedups。代码可在：\ url {https://github.com/polo5/fds}中获得

Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. We propose forward-mode differentiation with sharing (FDS), a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation, and gradient degradation issues by sharing hyperparameters that are contiguous in time. We provide theoretical guarantees about the noise reduction properties of our algorithm, and demonstrate its efficiency empirically by differentiating through $\sim 10^4$ gradient steps of unrolled optimization. We consider large hyperparameter search ranges on CIFAR-10 where we significantly outperform greedy gradient-based alternatives, while achieving $\times 20$ speedups compared to the state-of-the-art black-box methods. Code is available at: \url{https://github.com/polo5/FDS}

下载PDF全文

下载文献需遵守相关版权规定

论文标题