通过均匀的双倍条件训练DNN的准确性稳定性

论文标题

通过均匀的双倍条件训练DNN的准确性稳定性

Stability of Accuracy for the Training of DNNs Via the Uniform Doubling Condition

论文作者

Shmalo, Yitzchak

论文摘要

我们研究了深度神经网络（DNNS）训练期间准确性的稳定性。在这种情况下，DNN的训练是通过最小化交叉渗透损失函数进行的，并且性能指标是准确性（正确分类的对象的比例）。虽然训练会导致损失减少，但在此过程中的准确性不一定会增加，有时甚至可能会降低。实现准确性稳定性的目的是确保如果在初始时间的某个时候准确性很高，则在整个训练过程中保持较高。 Berlyand，Jabin和Safsten的最新结果在训练数据上引入了双倍条件，从而确保了使用绝对值激活函数在训练期间的训练过程中的准确性稳定性。对于$ \ mathbb {r}^n $中的训练数据，使用$ \ mathbb {r}^n $中的平板制定了这种加倍条件，并取决于板的选择。本文的目标是双重的。首先，要使双重条件统一，即与平板的选择无关。这仅在训练数据方面导致足够的稳定条件。换句话说，对于满足统一加倍状况的培训套件$ T $，存在一个DNN家族，使该家庭的DNN在某些培训时间$ T_0 $的培训时间很高，$ t_0 $的准确性很高。此外，建立统一性对于加倍条件的数值实施是必要的。第二个目标是将原始稳定性从绝对值激活函数扩展到具有有限的许多关键点（例如流行的泄漏依赖）的更广泛的分段线性激活函数。

We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in $\mathbb{R}^n$, this doubling condition is formulated using slabs in $\mathbb{R}^n$ and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set $T$ that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time $t_0$ will have high accuracy for all time $t>t_0$. Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题