对块循环数据的分布式优化

论文标题

对块循环数据的分布式优化

Distributed Optimization over Block-Cyclic Data

论文作者

Ding, Yucheng, Niu, Chaoyue, Yan, Yikai, Zheng, Zhenzhe, Wu, Fan, Chen, Guihai, Tang, Shaojie, Jia, Rongfei

论文摘要

我们考虑了联邦学习的基础实际数据特征，其中不平衡和非i.i.d。来自客户的数据具有一个块循环结构：每个周期都包含几个块，每个客户的培训数据遵循特定块和非I.I.D。分布。这种数据结构将在协作培训期间介绍客户端并阻止偏见：单个全局模型将偏向客户端或阻止特定数据。为了克服偏见，我们提出了两种新的分布式优化算法，称为多模型并行SGD（MM-PSGD）和多链并行SGD（MC-PSGD），收敛速率为$ O（1/\ sqrt {nt}）$，可实现与尊敬的线性相关的线性加速。特别是，MM-PSGD采用了混合训练策略，而MC-PSGD进一步添加了分离培训策略。两种算法通过平均和比较来自不同循环中该块中产生的历史全局模型来为每个块创建一个特定的预测指标。我们在CIFAR-10数据集上广泛评估了我们的算法。评估结果表明，我们的算法在测试准确性方面显着优于常规联合算法的常规联盟平均，并且还可以保留关键参数方差的鲁棒性。

We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client's training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the collaborative training: the single global model would be biased towards the client or block specific data. To overcome the biases, we propose two new distributed optimization algorithms called multi-model parallel SGD (MM-PSGD) and multi-chain parallel SGD (MC-PSGD) with a convergence rate of $O(1/\sqrt{NT})$, achieving a linear speedup with respect to the total number of clients. In particular, MM-PSGD adopts the block-mixed training strategy, while MC-PSGD further adds the block-separate training strategy. Both algorithms create a specific predictor for each block by averaging and comparing the historical global models generated in this block from different cycles. We extensively evaluate our algorithms over the CIFAR-10 dataset. Evaluation results demonstrate that our algorithms significantly outperform the conventional federated averaging algorithm in terms of test accuracy, and also preserve robustness for the variance of critical parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题