如何获得沟通高效的DNN培训？转换，压缩，正确

论文标题

如何获得沟通高效的DNN培训？转换，压缩，正确

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

论文作者

Chen, Zhong-Jing, Hernandez, Eduin E., Huang, Yu-Chih, Rini, Stefano

论文摘要

本文介绍了CO3-一种用于沟通效率联合深神经网络（DNN）培训的算法。 CO3从应用的三个处理中获取其名称，这些处理从远程用户传输到参数服务器时会减少通信负载。即：（i）通过浮点转换，（ii）量化梯度的无损压缩和（iii）量化误差校正。我们仔细设计上述每个步骤，以确保在沟通率限制下良好的培训表现。特别是，在步骤（i）和（ii）的步骤中，我们采用了以下假设：DNN梯度是根据广义正态分布分布的，该分布在本文中进行数值验证。对于步骤（iii），我们利用带有内存衰减机制的错误反馈来纠正步骤（i）中引入的量化错误。我们认为，与学习率类似的内存衰减系数可以最佳地调整以提高收敛性。提供了对拟议的CO3与SGD进行严格的合并分析。此外，通过广泛的模拟，我们表明，与文献中现有的梯度压缩方案相比，CO3提供了改进的性能，这些梯度压缩方案采用了局部梯度的素描和不均匀的量化。

This paper introduces CO3 -- an algorithm for communication-efficient federated Deep Neural Network (DNN) training. CO3 takes its name from three processing applied which reduce the communication load when transmitting the local DNN gradients from the remote users to the Parameter Server. Namely: (i) gradient quantization through floating-point conversion, (ii) lossless compression of the quantized gradient, and (iii) quantization error correction. We carefully design each of the steps above to assure good training performance under a constraint on the communication rate. In particular, in steps (i) and (ii), we adopt the assumption that DNN gradients are distributed according to a generalized normal distribution, which is validated numerically in the paper. For step (iii), we utilize an error feedback with memory decay mechanism to correct the quantization error introduced in step (i). We argue that the memory decay coefficient, similarly to the learning rate, can be optimally tuned to improve convergence. A rigorous convergence analysis of the proposed CO3 with SGD is provided. Moreover, with extensive simulations, we show that CO3 offers improved performance when compared with existing gradient compression schemes in the literature which employ sketching and non-uniform quantization of the local gradients.

下载PDF全文

下载文献需遵守相关版权规定

论文标题