论文标题

加速梯度方法,用于稀疏统计学习,非凸惩罚

Accelerated Gradient Methods for Sparse Statistical Learning with Nonconvex Penalties

论文作者

Yang, Kai, Asgharian, Masoud, Bhatnagar, Sahir

论文摘要

Nesterov的加速梯度(AG)是一项流行技术,可优化包括两个组成部分的目标功能:凸损失和惩罚函数。尽管AG方法对凸惩罚(例如LASSO)的表现良好,但是当将其应用于非凸惩罚(例如SCAD)时,可能会出现收敛问题。最近的一项提案将Nesterov的AG方法推广到NonConvex设置。所提出的算法需要为其实际应用规定几个超参数。除了某些一般条件外,没有选择超参数的明确规则,以及不同的选择如何影响算法的收敛性。在本文中,我们提出了一个基于复杂性上限的高参数设置,以加速收敛,并考虑将此非凸Ag算法应用于高维线性和物流稀疏学习问题。我们进一步建立了收敛速度,并提出了一种简单而有用的结合,以表征我们提出的最佳阻尼序列。模拟研究表明,平均而言,收敛速度比常规近端算法的速度要快得多。我们的实验还表明,在信号恢复方面,提出的方法通常优于当前最新方法。

Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprising two components: a convex loss and a penalty function. While AG methods perform well for convex penalties, such as the LASSO, convergence issues may arise when it is applied to nonconvex penalties, such as SCAD. A recent proposal generalizes Nesterov's AG method to the nonconvex setting. The proposed algorithm requires specification of several hyperparameters for its practical application. Aside from some general conditions, there is no explicit rule for selecting the hyperparameters, and how different selection can affect convergence of the algorithm. In this article, we propose a hyperparameter setting based on the complexity upper bound to accelerate convergence, and consider the application of this nonconvex AG algorithm to high-dimensional linear and logistic sparse learning problems. We further establish the rate of convergence and present a simple and useful bound to characterize our proposed optimal damping sequence. Simulation studies show that convergence can be made, on average, considerably faster than that of the conventional proximal gradient algorithm. Our experiments also show that the proposed method generally outperforms the current state-of-the-art methods in terms of signal recovery.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源