对非凸的联合优化的随机梯度方法的统一分析

论文标题

对非凸的联合优化的随机梯度方法的统一分析

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

论文作者

Li, Zhize, Richtárik, Peter

论文摘要

在本文中，我们研究了平滑的非洞穴制度中大型SGD变体系列的性能。为此，我们提出了一个通用且灵活的假设，能够准确地建模随机梯度的第二刻。我们的假设是通过文献中大量特定的SGD特定变体来满足的，包括具有任意采样的SGD，带有压缩梯度的SGD以及各种方差降低的SGD方法，例如SVRG和SAGA。我们为所有满足提议的统一假设的方法提供了单一的收敛分析，从而提供了对非convex制度中SGD变体的统一理解，而不是依靠每个变体的专用分析。此外，我们的统一分析足够准确，可以根据几种经典方法的最著名收敛结果恢复或改进，并且为许多新方法提供了新的收敛结果，这些新方法是特殊情况。在更通用的分布式/联合非convex优化设置中，我们提出了两个新的一般算法框架，在直接梯度压缩（DC）或梯度差异（DIANA）的压缩方面有所不同。我们表明，这两个框架捕获的所有方法也满足了我们的统一假设。因此，我们的统一收敛分析还捕获了使用压缩通信的各种分布式方法。最后，我们还提供了统一的分析，以在PL条件下在该非凸状态下获得更快的线性收敛速率。

In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases. In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used. We show that all methods captured by these two frameworks also satisfy our unified assumption. Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication. Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题