论文标题
通过近端环境中的舞台训练成对学习
Pairwise Learning via Stagewise Training in Proximal Setting
论文作者
论文摘要
成对目标范例是机器学习的重要方面。使用成对目标功能的机器学习方法的示例包括面部识别,指标学习,两分性学习,多个内核学习以及曲线下面积(AUC)最大化的差异网络。与点式学习相比,成对学习的样本量随样本数量的数量二次增长,从而使其复杂性增长。研究人员主要通过利用在线学习系统来应对这一挑战。然而,最近的研究为平滑损失功能提供了自适应样本量训练,作为融合和复杂性方面的更好策略,但没有全面的理论研究。在一项独特的研究方面,重要性抽样引发了有限的点向和最小化的极大兴趣。这是因为随机梯度方差,这会导致收敛大大减慢。在本文中,我们结合了成对学习的自适应样本量和重要性采样技术,并保证了非平滑凸凸成对损耗函数的收敛保证。特别是,使用扩展的训练集对模型进行随机训练,以针对从稳定性边界得出的预定义数量的迭代。此外,我们证明在每次迭代时进行采样相反的实例会降低梯度的方差,从而加速收敛。 AUC最大化中各种数据集的实验证实了理论结果。
The pairwise objective paradigms are an important and essential aspect of machine learning. Examples of machine learning approaches that use pairwise objective functions include differential network in face recognition, metric learning, bipartite learning, multiple kernel learning, and maximizing of area under the curve (AUC). Compared to pointwise learning, pairwise learning's sample size grows quadratically with the number of samples and thus its complexity. Researchers mostly address this challenge by utilizing an online learning system. Recent research has, however, offered adaptive sample size training for smooth loss functions as a better strategy in terms of convergence and complexity, but without a comprehensive theoretical study. In a distinct line of research, importance sampling has sparked a considerable amount of interest in finite pointwise-sum minimization. This is because of the stochastic gradient variance, which causes the convergence to be slowed considerably. In this paper, we combine adaptive sample size and importance sampling techniques for pairwise learning, with convergence guarantees for nonsmooth convex pairwise loss functions. In particular, the model is trained stochastically using an expanded training set for a predefined number of iterations derived from the stability bounds. In addition, we demonstrate that sampling opposite instances at each iteration reduces the variance of the gradient, hence accelerating convergence. Experiments on a broad variety of datasets in AUC maximization confirm the theoretical results.