拜占庭式非凸的随机梯度下降

论文标题

拜占庭式非凸的随机梯度下降

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

论文作者

Allen-Zhu, Zeyuan, Ebrahimian, Faeze, Li, Jerry, Alistarh, Dan

论文摘要

我们研究了对手弹性的随机分布式优化，其中$ M $机器可以独立计算随机梯度，并合作以共同优化其本地目标功能。但是，机器的$α$分数为$ \ textit {byzantine} $，因为它们可能以任意的，对抗性方式行事。我们在挑战性的$ \ textit {non-convex} $案例中考虑了此过程的一个变体。我们的主要结果是一种新的算法SafeguardsGD，可以证明可以逃脱鞍点并找到非凸物目标的局部最小值。该算法基于一种新的浓度滤波技术，其样品和时间复杂性界限与随机，分布式设置中最著名的理论边界相匹配，当时不存在拜占庭机器。我们的算法非常实用：它在训练深神经网络时的所有先前方法的性能，它相对轻巧，并且是承受两种最近爆发的拜占庭攻击的第一种方法。

We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $α$-fraction of the machines are $\textit{Byzantine}$, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging $\textit{non-convex}$ case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题