论文标题
关于二重性优化问题的稳定性和概括
On Stability and Generalization of Bilevel Optimization Problem
论文作者
论文摘要
(随机)双光线优化是机器学习中经常遇到的问题,具有多种应用,例如元学习,超参数优化和增强学习。关于此问题的大多数现有研究仅着重于分析收敛或提高收敛速率,而几乎没有努力来理解其泛化行为。在本文中,我们对双层优化问题的一阶(基于梯度)方法的概括进行了详尽的分析。我们首先建立了不同形式的算法稳定性与概括错误之间的基本联系,并给出了高概率概括性结合,从而将先前的最佳限制从$ \ bigo(\ sqrt {n})$转化为$ \ bigo(\ log log n)$,在$ n $的地方是样本大小。然后,我们为内部和外部级别参数都进行连续更新的一般情况提供了第一个稳定界限,而现有工作仅允许更新外部级别参数。我们的分析可以应用于各种标准设置,例如强率强度符合(SC-SC),凸形串联(C-C)和NonConvex-Nonconvex(NC-NC)。我们对NC-NC设置的分析也可以扩展到通常在实践中遇到的特定非convex-rong-convex(NC-SC)设置。最后,我们证实了我们的理论分析,并证明了迭代如何通过对元学习和超参数优化的实验来影响概括误差。
(Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization behaviors. In this paper, we conduct a thorough analysis on the generalization of first-order (gradient-based) methods for the bilevel optimization problem. We first establish a fundamental connection between algorithmic stability and generalization error in different forms and give a high probability generalization bound which improves the previous best one from $\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then provide the first stability bounds for the general case where both inner and outer level parameters are subject to continuous update, while existing work allows only the outer level parameter to be updated. Our analysis can be applied in various standard settings such as strongly-convex-strongly-convex (SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for the NC-NC setting can also be extended to a particular nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in practice. Finally, we corroborate our theoretical analysis and demonstrate how iterations can affect the generalization error by experiments on meta-learning and hyper-parameter optimization.