论文标题
Wyner-ZIV估计器用于分布式平均值估计,并具有侧面信息和优化
Wyner-Ziv Estimators for Distributed Mean Estimation with Side Information and Optimization
论文作者
论文摘要
沟通有效的分布式平均值估计是一个重要的原始性,它在许多分布式学习和优化方案(例如联合学习)中产生。如果没有对基础数据的任何概率假设,我们将研究服务器可以访问侧面信息的分布式平均值估计问题。我们提出\ emph {wyner-ziv估计器},当已知侧面信息与数据之间的距离时,它们是通信和计算效率和近乎优势的效率和近乎最佳的。作为推论,我们还表明,我们的算法为信息理论中经典的Wyner-Ziv问题提供了有效的方案。在不同的方向上,当对侧信息和数据之间的距离不了解时,我们提出了使用相关采样的替代性Wyner-Ziv估计器。后一种设置提供{\ em通用恢复保证},当用户数量较大并跟踪数据和侧面信息之间的距离时,也许会引起实践的关注。 借助我们可以使用的平均估计器,我们在分散优化和压缩中重新审视基本问题,其中Wyner-ZIV估计器可产生几乎最佳性能的算法。首先,我们考虑了限制了分布式优化的通信问题,并提供了一种算法,该算法通过利用梯度估计彼此接近的事实来达到最佳收敛率。具体而言,我们的算法中的梯度压缩方案首先使用一半的当事方形成侧面信息,然后使用我们的Wyner-Ziv估计器来压缩其余梯度估计的一半。
Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning. Without any probabilistic assumptions on the underlying data, we study the problem of distributed mean estimation where the server has access to side information. We propose \emph{Wyner-Ziv estimators}, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known. As a corollary, we also show that our algorithms provide efficient schemes for the classic Wyner-Ziv problem in information theory. In a different direction, when there is no knowledge assumed about the distance between side information and the data, we present an alternative Wyner-Ziv estimator that uses correlated sampling. This latter setting offers {\em universal recovery guarantees}, and perhaps will be of interest in practice when the number of users is large and keeping track of the distances between the data and the side information may not be possible. With this mean estimator at our disposal, we revisit basic problems in decentralized optimization and compression where our Wyner-Ziv estimator yields algorithms with almost optimal performance. First, we consider the problem of communication constrained distributed optimization and provide an algorithm which attains the optimal convergence rate by exploiting the fact that the gradient estimates are close to each other. Specifically, the gradient compression scheme in our algorithm first uses half of the parties to form side information and then uses our Wyner-Ziv estimator to compress the remaining half of the gradient estimates.