啤酒：快速$ o（1/t）$用于分散的非covex优化的利率，并通过通信压缩

论文标题

啤酒：快速$ o（1/t）$用于分散的非covex优化的利率，并通过通信压缩

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

论文作者

Zhao, Haoyu, Li, Boyue, Li, Zhize, Richtárik, Peter, Chi, Yuejie

论文摘要

沟通效率已被广泛认为是在多机构或联合环境中大规模分散的机器学习应用程序的瓶颈。为了解决沟通瓶颈，已经进行了许多努力来设计交流压缩算法以进行分散的非covex优化，在这些算法中，只允许客户通过预先定义的图形拓扑来传达少量的量化信息（aka bit）。尽管做出了巨大的努力，但与未压缩的$ g $相比，非convex设置中的最先进的算法仍然遭受较慢的收敛速度$ o（（g/t）^{2/3}）$，其中$ g $衡量不同客户的数据异质性，而$ t $ t $是通信的数量。本文提出了啤酒，该啤酒通过梯度跟踪采用通信压缩，并显示其收敛速度更快$ o（1/t）$。即使在任意数据异质性下，这也可以显着改善最先进的速率，即使在任意数据异质性下也没有压缩。还提供了数值实验来证实我们的理论，并确认啤酒在数据异质制度中的实际优势。

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题