论文标题

分布式学习中模型汇总的沟通效率的基本限制:一种利率启动方法

Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach

论文作者

Zhang, Naifu, Tao, Meixia, Wang, Jia, Xu, Fan

论文摘要

分布式学习的主要重点之一是沟通效率,因为每一轮训练的模型聚集可能包括数百万到数十亿个参数。已经提出了几种模型压缩方法,例如梯度量化和稀疏方法,以提高模型聚合的通信效率。但是,对于给定的梯度估计器的扭曲,信息理论的最低通信成本仍然未知。在本文中,我们研究了从率延伸的角度研究模型聚合的基本沟通成本的基本限制。通过将模型聚合作为矢量高斯首席执行官问题,我们得出了模型聚合问题的速率区域和总和率分数函数,这揭示了特定梯度失真上限时的最低通信速率。我们还根据现实世界数据集的梯度统计数据,分析每次迭代和总通信成本的通信成本。发现通过利用工人节点之间的相关性来获得沟通增益,对于符号,梯度估计器的高变形可以实现梯度压缩中的总沟通成本较低。

One of the main focuses in distributed learning is communication efficiency, since model aggregation at each round of training can consist of millions to billions of parameters. Several model compression methods, such as gradient quantization and sparsification, have been proposed to improve the communication efficiency of model aggregation. However, the information-theoretic minimum communication cost for a given distortion of gradient estimators is still unknown. In this paper, we study the fundamental limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective. By formulating the model aggregation as a vector Gaussian CEO problem, we derive the rate region bound and sum-rate-distortion function for the model aggregation problem, which reveals the minimum communication rate at a particular gradient distortion upper bound. We also analyze the communication cost at each iteration and total communication cost based on the sum-rate-distortion function with the gradient statistics of real-world datasets. It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD, and a high distortion of gradient estimator can achieve low total communication cost in gradient compression.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源