论文标题
神经崩溃的扰动分析
Perturbation Analysis of Neural Collapse
论文作者
论文摘要
训练深层神经网络进行分类通常包括最大程度地减少培训误差点以外的训练损失。在训练的这一阶段,已经观察到了“神经崩溃”行为:课堂样本的特征(倒数第二层输出)的变异性降低,而不同类别的平均特征接近某些紧密的框架结构。最近的著作通过理想化的无约束特征模型分析了这种行为,其中所有最小化器都表现出精确的崩溃。但是,使用实用的网络和数据集,这些功能通常不会达到确切的崩溃,例如,因为深层无法任意修改远离倒塌的中间特征。在本文中,我们提出了一个更丰富的模型,该模型可以通过强迫特征留在预定义特征矩阵附近(例如中间特征)来捕获这种现象。我们通过扰动分析在小附近案例中探索模型,并确定先前研究的模型无法获得的结果。例如,与预定义的输入特征相比,我们证明了优化特征的类内变异性(通过分析“中央路径”上的梯度流量以最小的假设分析梯度流),分析近折叠状态下的最小化器,并提供有关正规化超参数撞击的效果的见解。我们通过实践深度学习环境中的实验来支持我们的理论。
Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point. In this phase of training, a "neural collapse" behavior has been observed: the variability of features (outputs of the penultimate layer) of within-class samples decreases and the mean features of different classes approach a certain tight frame structure. Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse. However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features). We explore the model in the small vicinity case via perturbation analysis and establish results that cannot be obtained by the previously studied models. For example, we prove reduction in the within-class variability of the optimized features compared to the predefined input features (via analyzing gradient flow on the "central-path" with minimal assumptions), analyze the minimizers in the near-collapse regime, and provide insights on the effect of regularization hyperparameters on the closeness to collapse. We support our theory with experiments in practical deep learning settings.