限制梯度压缩以进行随机优化

论文标题

限制梯度压缩以进行随机优化

Limits on Gradient Compression for Stochastic Optimization

论文作者

Mayekar, Prathamesh, Tyagi, Himanshu

论文摘要

我们考虑使用一阶甲骨文的访问权限在$ \ ell_p $空间上考虑随机优化。我们问：{{Oracle输出保留不受限制的融合率所需的最低精度是多少？}我们通过得出信息理论下限并提供（几乎）（几乎）实现这些下限的量化器来表征每个$ p \ geq 1 $的精度。我们的量化器是新的，易于实现的。特别是，对于$ p = 2 $和$ p = \ infty $，我们的结果是准确的，显示了这些设置所需的最低精度分别为$θ（d）$和$θ（\ log d）$。后一个结果令人惊讶，因为恢复梯度向量将需要$ω（d）$位。

We consider stochastic optimization over $\ell_p$ spaces using access to a first-order oracle. We ask: {What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates?} We characterize this precision for every $p\geq 1$ by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new and easy to implement. In particular, our results are exact for $p=2$ and $p=\infty$, showing the minimum precision needed in these settings are $Θ(d)$ and $Θ(\log d)$, respectively. The latter result is surprising since recovering the gradient vector will require $Ω(d)$ bits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题