论文标题
限制梯度压缩以进行随机优化
Limits on Gradient Compression for Stochastic Optimization
论文作者
论文摘要
我们考虑使用一阶甲骨文的访问权限在$ \ ell_p $空间上考虑随机优化。我们问:{{Oracle输出保留不受限制的融合率所需的最低精度是多少?}我们通过得出信息理论下限并提供(几乎)(几乎)实现这些下限的量化器来表征每个$ p \ geq 1 $的精度。我们的量化器是新的,易于实现的。特别是,对于$ p = 2 $和$ p = \ infty $,我们的结果是准确的,显示了这些设置所需的最低精度分别为$θ(d)$和$θ(\ log d)$。后一个结果令人惊讶,因为恢复梯度向量将需要$ω(d)$位。
We consider stochastic optimization over $\ell_p$ spaces using access to a first-order oracle. We ask: {What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates?} We characterize this precision for every $p\geq 1$ by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new and easy to implement. In particular, our results are exact for $p=2$ and $p=\infty$, showing the minimum precision needed in these settings are $Θ(d)$ and $Θ(\log d)$, respectively. The latter result is surprising since recovering the gradient vector will require $Ω(d)$ bits.