在无噪声线性模型下，随机梯度下降的紧密非参数收敛速率

论文标题

在无噪声线性模型下，随机梯度下降的紧密非参数收敛速率

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

论文作者

Berthier, Raphaël, Bach, Francis, Gaillard, Pierre

论文摘要

在统计监督学习的背景下，无声的线性模型假设存在确定性的线性关系$ y = \langleθ_*，x \ rangle $之间的随机输出$ y $和随机特征向量vector $φ（u）$，一种潜在的输入的非线性转换。我们分析了该模型下最小二乘风险上的单通，固定的阶梯大小随机梯度下降的收敛性。迭代到最佳$θ_*$的收敛性和概括误差的衰减遵循多项式收敛速率的指数，这些指数都取决于最佳$θ_*$的规律性以及特征向量$φ（u）$。我们在复制的内核希尔伯特太空框架中解释了我们的结果。作为一种特殊情况，我们分析了一种在线算法，以从单位间隔估算其在随机采样点上对其值的无噪声观察到的实际功能；收敛取决于函数和所选内核的Sobolev平滑度。最后，我们将分析超出监督学习设置，以根据其光谱维度获得图表上平均过程的收敛率（又称八卦算法）。

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle θ_*, X \rangle$ between the random output $Y$ and the random feature vector $Φ(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $θ_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $θ_*$ and of the feature vectors $Φ(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

下载PDF全文

下载文献需遵守相关版权规定

论文标题