论文标题
对多类分类的理论见解:高维渐近视图
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
论文作者
论文摘要
当代机器学习应用程序通常涉及许多类别的分类任务。尽管它们广泛使用,但仍然缺少对分类算法的统计属性和行为的精确理解,尤其是在类相当大的现代制度中。在本文中,我们通过对线性多类分类进行第一个渐近精确的分析来朝这个方向迈出一步。我们的理论分析使我们能够精确地表征测试误差如何在不同的培训算法,数据分布,问题维度以及类,间/内部类别相关性和类先验的数量上变化。具体而言,我们的分析表明,分类精度高度依赖于分布,而不同的算法可实现不同数据分布和/或培训/特征大小的最佳性能。与线性回归/二进制分类不同,多类分类中的测试误差依赖于训练有素的模型的复杂功能(例如,某些训练有素的权重之间的相关性)很难表征渐近行为。这个挑战已经存在于简单分类器中,例如那些最小化正方形损失的挑战。我们的新颖理论技术使我们能够克服其中的一些挑战。获得的见解可能为精确理解本文所研究的算法的其他分类算法铺平了道路。
Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically precise analysis of linear multiclass classification. Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors. Specifically, our analysis reveals that the classification accuracy is highly distribution-dependent with different algorithms achieving optimal performance for different data distributions and/or training/features sizes. Unlike linear regression/binary classification, the test error in multiclass classification relies on intricate functions of the trained model (e.g., correlation between some of the trained weights) whose asymptotic behavior is difficult to characterize. This challenge is already present in simple classifiers, such as those minimizing a square loss. Our novel theoretical techniques allow us to overcome some of these challenges. The insights gained may pave the way for a precise understanding of other classification algorithms beyond those studied in this paper.