使用单调变化不等式训练神经网络的替代方法

论文标题

使用单调变化不等式训练神经网络的替代方法

An alternative approach to train neural networks using monotone variational inequality

论文作者

Xu, Chen, Cheng, Xiuyuan, Xie, Yao

论文摘要

We propose an alternative approach to neural network training using the monotone vector field, an idea inspired by the seminal work of Juditsky and Nemirovski [Juditsky & Nemirovsky, 2019] developed originally to solve parameter estimation problems for generalized linear models (GLM) by reducing the original non-convex problem to a convex problem of solving a monotone variational inequality (VI).我们的方法会导致计算高效的程序快速收敛并在某些特殊情况下提供保证，例如训练单层神经网络或对预训练模型的最后一层进行微调。我们的方法可用于更有效地对预训练的模型进行更有效的微调，同时冻结底层，这是部署许多机器学习模型（例如大语言模型（LLM））的重要步骤。我们证明了它在训练完全连接（FC）神经网络，图形神经网络（GNN）和卷积神经网络（CNN）中的适用性，与在合成和真实网络数据预测任务上的随机梯度下降方法相比，我们的方法的竞争性或更好性能是我们方法的竞争性或更好的性能。

We propose an alternative approach to neural network training using the monotone vector field, an idea inspired by the seminal work of Juditsky and Nemirovski [Juditsky & Nemirovsky, 2019] developed originally to solve parameter estimation problems for generalized linear models (GLM) by reducing the original non-convex problem to a convex problem of solving a monotone variational inequality (VI). Our approach leads to computationally efficient procedures that converge fast and offer guarantee in some special cases, such as training a single-layer neural network or fine-tuning the last layer of the pre-trained model. Our approach can be used for more efficient fine-tuning of a pre-trained model while freezing the bottom layers, an essential step for deploying many machine learning models such as large language models (LLM). We demonstrate its applicability in training fully-connected (FC) neural networks, graph neural networks (GNN), and convolutional neural networks (CNN) and show the competitive or better performance of our approach compared to stochastic gradient descent methods on both synthetic and real network data prediction tasks regarding various performance metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题