论文标题
深度神经网络分类器在教师学生环境下的收敛速度急剧
Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting
论文作者
论文摘要
由神经网络构建的分类器处理大规模的高维数据,例如来自计算机视觉的面部图像,而传统的统计方法通常会惨败。在本文中,我们试图通过得出过多风险的收敛速率来理解高维分类的经验成功。特别是,提出了一个教师学生的框架,该框架假设贝叶斯分类器表示为Relu神经网络。在此设置中,对于使用0-1损失或铰链损失训练的分类器,我们获得了趋势的收敛速率,即$ \ tilde {o} _d(n^{ - 2/3})$。当数据分发可分开时,可以将此速率进一步提高到$ \ tilde {o} _d(n^{ - 1})$。在这里,$ n $表示样本量。一个有趣的观察结果是,数据维度仅在上述费率中有助于$ \ log(n)$项。这可能为高维分类中深神经网络的经验成功提供了一种理论解释,尤其是对于结构化数据。
Classifiers built with neural networks handle large-scale high dimensional data, such as facial images from computer vision, extremely well while traditional statistical methods often fail miserably. In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk. In particular, a teacher-student framework is proposed that assumes the Bayes classifier to be expressed as ReLU neural networks. In this setup, we obtain a sharp rate of convergence, i.e., $\tilde{O}_d(n^{-2/3})$, for classifiers trained using either 0-1 loss or hinge loss. This rate can be further improved to $\tilde{O}_d(n^{-1})$ when the data distribution is separable. Here, $n$ denotes the sample size. An interesting observation is that the data dimension only contributes to the $\log(n)$ term in the above rates. This may provide one theoretical explanation for the empirical successes of deep neural networks in high dimensional classification, particularly for structured data.