通过解决过滤问题来减少随机梯度差异

论文标题

通过解决过滤问题来减少随机梯度差异

Stochastic Gradient Variance Reduction by Solving a Filtering Problem

论文作者

Yang, Xingyi

论文摘要

深度神经网络（DNN）通常使用随机梯度下降（SGD）优化。但是，使用随机样品对梯度的估计往往嘈杂且不可靠，导致梯度差异很大和收敛不良。在本文中，我们提出\ textbf {filter梯度体面}〜（fgd），这是一种有效的随机优化算法，通过通过不同的过滤器设计解决自适应过滤问题，通过解决自适应过滤问题来对局部梯度进行一致的估计。我们的方法通过纳入历史状态来增强当前估计，从而降低了随机梯度下降的方差。它能够纠正嘈杂的梯度方向以及加速学习的融合。我们证明了所提出的滤波器梯度下降在数值优化和训练神经网络上的有效性，与传统的基于动量的方法相比，它在该网络中实现了优越和稳健的性能。据我们所知，我们是第一个提供实用解决方案，该解决方案将过滤整合到梯度估计中，通过使梯度估计与信号处理中的过滤问题之间的类比进行类比。（该代码在https://github.com/adamdad/filter-gradient-decent中提供）

Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD). However, the estimation of the gradient using stochastic samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence. In this paper, we propose \textbf{Filter Gradient Decent}~(FGD), an efficient stochastic optimization algorithm that makes the consistent estimation of the local gradient by solving an adaptive filtering problem with different design of filters. Our method reduces variance in stochastic gradient descent by incorporating the historical states to enhance the current estimation. It is able to correct noisy gradient direction as well as to accelerate the convergence of learning. We demonstrate the effectiveness of the proposed Filter Gradient Descent on numerical optimization and training neural networks, where it achieves superior and robust performance compared with traditional momentum-based methods. To the best of our knowledge, we are the first to provide a practical solution that integrates filtering into gradient estimation by making the analogy between gradient estimation and filtering problems in signal processing. (The code is provided in https://github.com/Adamdad/Filter-Gradient-Decent)

下载PDF全文

下载文献需遵守相关版权规定

论文标题