在学习不平衡的学习中诱发神经崩溃：我们真的需要在深神经网络末尾学习的可学习分类器吗？

论文标题

在学习不平衡的学习中诱发神经崩溃：我们真的需要在深神经网络末尾学习的可学习分类器吗？

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

论文作者

Yang, Yibo, Chen, Shixiang, Li, Xiangtai, Xie, Liang, Lin, Zhouchen, Tao, Dacheng

论文摘要

用于分类的现代深神经网络通常会共同学习代表性的骨干和线性分类器，以输出每个类的logit。最近的一项研究表明，一种称为神经崩溃的现象，即在平衡数据集上训练的终端阶段，特征的课堂内特征手段和分类器向量收敛到单纯的equiangular紧密框架（ETF）的顶点。由于ETF几何结构最大程度地分开了分类器中所有类别的成对角度，因此自然提出问题是很自然的，为什么当我们知道其最佳几何结构时，我们要花费努力学习分类器？在本文中，我们研究了学习与分类器在训练过程中随机初始初始初始初始初始初始初始初始初始初始初米初米初米初米初米初米初喝配米子分类的潜力。我们基于层磨损模型的分析工作表明，即使数据集之间的数据集不平衡，使用固定ETF分类器的特征学习自然会导致神经崩溃状态。我们进一步表明，在这种情况下，不需要交叉熵（CE）损失，可以用具有相同全球最优性但享有更好的融合属性的简单平方损失代替。我们的实验结果表明，我们的方法能够在多个不平衡数据集中更快地收敛带来重大改进。

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes in the classifier, it is natural to raise the question, why do we spend an effort to learn a classifier when we know its optimal geometric structure? In this paper, we study the potential of learning a neural network for classification with the classifier randomly initialized as an ETF and fixed during training. Our analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes. We further show that in this case the cross entropy (CE) loss is not necessary and can be replaced by a simple squared loss that shares the same global optimality but enjoys a better convergence property. Our experimental results show that our method is able to bring significant improvements with faster convergence on multiple imbalanced datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题