论文标题

部分可观测时空混沌系统的无模型预测

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

论文作者

Li, Hongkang, Zhang, Shuai, Wang, Meng

论文摘要

本文分析了训练一个隐藏的神经网络的收敛和概括,当输入特征遵循由有限数量的高斯分布组成的高斯混合模型。假设标签是由具有未知地面真相重量的教师模型产生的,那么学习问题是通过最大程度地降低学生神经网络的非凸风险功能来估算基础教师模型。借助有限数量的训练样本,提到样品复杂性,迭代被证明是线性收敛到临界点,并保证了概括误差。另外,本文首次表征了输入分布对样本复杂性和学习率的影响。

This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源