论文标题
分散培训中的最佳复杂性
Optimal Complexity in Decentralized Training
论文作者
论文摘要
权力下放是扩展并行机器学习系统扩展的有前途的方法。在本文中,我们在随机非凸设置中的此类方法的迭代复杂性上提供了一个紧密的下限。我们的下限揭示了许多现有的分散培训算法(例如D-PSGD)的已知收敛速率的理论差距。我们通过施工证明,这种下限是紧密而可以实现的。在我们的见解中,我们进一步提出了DITAG,这是一种实用的八卦风格的分散算法,仅具有对数差距才能实现下限。从经验上讲,我们将分类与图像分类任务上的其他分散算法进行比较,并且与基准相比,我们显示的零件享受更快的收敛性,尤其是在未保存的数据和稀疏网络中。
Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.