论文标题
随机特征放大:神经网络中的特征学习和概括
Random Feature Amplification: Feature Learning and Generalization in Neural Networks
论文作者
论文摘要
在这项工作中,我们在两层relu网络中提供了特征学习过程的表征,这些网络在随机初始化后通过梯度下降对逻辑损失进行了训练。我们考虑使用输入功能的XOR样函数生成的二进制标签的数据。我们允许不断的培训标签被对手破坏。我们表明,尽管线性分类器并不比随机猜测我们考虑的分布更好,但通过梯度下降训练的两层relu网络实现了接近标签噪声速率的概括误差。我们开发了一种新颖的证明技术,该技术表明,在初始化时,绝大多数神经元充当随机特征,仅与有用特征无关紧要,而梯度下降动力学则“放大”这些弱,随机的特征到强,有用的特征。
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.