论文标题
随机Steffensen方法
Stochastic Steffensen method
论文作者
论文摘要
一阶方法,即仅允许的第一阶导数是否可以四次收敛?对于单变量损失函数,答案是肯定的 - Steffensen方法避免了第二个衍生物,并且仍然像牛顿方法一样四倍地收敛。通过合并最佳步长大小,我们甚至可以将其收敛顺序推到二次之外,至$ 1+\ sqrt {2} \大约2.414 $。尽管如此高的收敛顺序对于确定性算法来说是毫无意义的过度杀伤,但是当算法被随机分配给大小问题时,它们会变得有意义,因为随机化始终会损害收敛速度。我们将介绍两个受Steffensen方法启发的自适应学习率,该学习率旨在用于随机优化设置,并且不需要除批量大小外的超参数调整。广泛的实验表明,它们与几种现有的一阶方法相比有利。当仅限于二次目标时,我们随机的Steffensen方法还原为随机的Kaczmarz方法 - 请注意,对于SGD或SLBFGS而言,这是不正确的 - 因此,我们也可以将我们的方法视为对随机Kaczmarz的概括为任意目标。
Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.