论文标题
服务器端的步骤和采样无需替换,这证明有助于联合优化
Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization
论文作者
论文摘要
我们介绍了联合学习中服务器端优化的理论研究。我们的结果是第一个表明,在使用本地通过的局部通行客户数据的情况下,使用额外参数扩展客户更新的广泛流行启发式方法非常有用。每次局部通行证都是在不使用随机改组的情况下进行的,这是我们可以表现出改善复杂性的关键原因。特别是,我们证明每当本地步骤较小时,更新方向是由fedavg与所有客户端进行随机重新装置给出的更新方向时,就可以在获得的方向上取得很大的飞跃,并提高凸,强凸和非convex目标的速率。特别是,在非convex制度中,我们从$ \ MATHCAL {O} \ left(\ Varepsilon^{ - 3} \ right)$到$ \ Mathcal {o} \ left(\ welepsilon^left(\ varepsilon^{ - 2}} \ right)$。即使对于在单个节点上执行的随机重组,此结果也是新的。相比之下,如果本地步骤尺寸很大,我们证明可以使用小型服务器端步骤尺寸来控制客户端采样的噪声。据我们所知,这是本地步骤首次证明有助于克服沟通瓶颈。总之,我们对大型和小型服务器端步骤尺寸的优势的结果为联合学习中自适应服务器端优化的实践提供了正式的理由。此外,我们考虑了支持部分客户参与的算法的一种变体,这使该方法更加实用。
We present a theoretical study of server-side optimization in federated learning. Our results are the first to show that the widely popular heuristic of scaling the client updates with an extra parameter is very useful in the context of Federated Averaging (FedAvg) with local passes over the client data. Each local pass is performed without replacement using Random Reshuffling, which is a key reason we can show improved complexities. In particular, we prove that whenever the local stepsizes are small, and the update direction is given by FedAvg in conjunction with Random Reshuffling over all clients, one can take a big leap in the obtained direction and improve rates for convex, strongly convex, and non-convex objectives. In particular, in non-convex regime we get an enhancement of the rate of convergence from $\mathcal{O}\left(\varepsilon^{-3}\right)$ to $\mathcal{O}\left(\varepsilon^{-2}\right)$. This result is new even for Random Reshuffling performed on a single node. In contrast, if the local stepsizes are large, we prove that the noise of client sampling can be controlled by using a small server-side stepsize. To the best of our knowledge, this is the first time that local steps provably help to overcome the communication bottleneck. Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning. Moreover, we consider a variant of our algorithm that supports partial client participation, which makes the method more practical.