具有随机特征和分布梯度下降的分散学习

论文标题

具有随机特征和分布梯度下降的分散学习

Decentralised Learning with Random Features and Distributed Gradient Descent

论文作者

Richards, Dominic, Rebeschini, Patrick, Rosasco, Lorenzo

论文摘要

我们研究了分布式梯度下降的概括性能，具有隐式正则化和随机特征在同质环境中，在该设置中，给出了与相同未知分布独立于独立于数据采样的数据网络。除了减少内存足迹外，随机特征在此设置中特别方便，因为它们在跨代理之间提供了一个共同的参数化，从而可以克服实现分散核回归的先前困难。在标准源和容量假设下，我们在每个代理的预测性能上建立了高概率界限，这是步骤大小的函数，迭代次数，通信矩阵的逆频谱间隙以及随机特征的数量。通过调整这些参数，我们获得了相对于网络中样本总数最佳的统计速率。该算法在内存成本方面提供了对单机器梯度下降的线性改进，并且当代理在网络大小和反光谱差距方面持有足够的数据时，对于任何网络拓扑而言，计算运行时的线性加速。我们提出模拟，以显示随机特征，迭代和样品的数量如何影响预测性能。

We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution. Along with reducing the memory footprint, Random Features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing Decentralised Kernel Regression. Under standard source and capacity assumptions, we establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of Random Features. By tuning these parameters, we obtain statistical rates that are minimax optimal with respect to the total number of samples in the network. The algorithm provides a linear improvement over single machine Gradient Descent in memory cost and, when agents hold enough data with respect to the network size and inverse spectral gap, a linear speed-up in computational runtime for any network topology. We present simulations that show how the number of Random Features, iterations and samples impact predictive performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题