论文标题
在最小化polyak-lojasiewicz功能的下限
On the Lower Bound of Minimizing Polyak-Łojasiewicz Functions
论文作者
论文摘要
Polyak-lojasiewicz(PL)[Polyak,1963]条件比强凸度弱,但足以确保梯度下降算法的全局收敛。在本文中,我们使用一阶甲壳研究算法的下限,以找到近似的最佳解决方案。我们表明,任何一阶算法都需要至少$ω\ left(\ frac {l}μ\ log \ frac {1} {\ varepsilon} \ right)$梯度成本才能找到$ \ varepsilon $ -approximate $ -approximate $ -sapproximate $ -sapproximate $ - $ - $ - $ - $ - $ -SMOOTH $ - $ - SMOTH $ $ $ $ $ - 该结果证明了梯度下降算法在存在``硬''PL函数的意义上最小化平滑PL函数的最佳性,因此当忽略数值常数时,没有一阶算法可以比梯度下降更快。相反,众所周知,动量技术,例如[Nesterov,2003年,第一章。 2]可以证明可以将梯度下降到$ {o} \ left(\ sqrt {\ frac {\ frac {l} {\hatμ}} \ log \ frac {1} {\ varepsilon} \ varepsilon} \ right)$用于$ l $ l $ smoth and $ smoth and $ hat $的功能的梯度成本。因此,我们的结果区分了最小化平滑的PL功能和平滑凸功能的硬度,因为通常任何多项式顺序都无法改善前者的复杂性。
Polyak-Łojasiewicz (PL) [Polyak, 1963] condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least $Ω\left(\frac{L}μ\log\frac{1}{\varepsilon}\right)$ gradient costs to find an $\varepsilon$-approximate optimal solution for a general $L$-smooth function that has an $μ$-PL constant. This result demonstrates the optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a ``hard'' PL function such that no first-order algorithm can be faster than Gradient Descent when ignoring a numerical constant. In contrast, it is well-known that the momentum technique, e.g. [Nesterov, 2003, chap. 2] can provably accelerate Gradient Descent to ${O}\left(\sqrt{\frac{L}{\hatμ}}\log\frac{1}{\varepsilon}\right)$ gradient costs for functions that are $L$-smooth and $\hatμ$-strongly convex. Therefore, our result distinguishes the hardness of minimizing a smooth PL function and a smooth strongly convex function as the complexity of the former cannot be improved by any polynomial order in general.