数据增强一致性的样本效率

论文标题

数据增强一致性的样本效率

Sample Efficiency of Data Augmentation Consistency Regularization

论文作者

Yang, Shuo, Dong, Yijun, Ward, Rachel, Dhillon, Inderjit S., Sanghavi, Sujay, Lei, Qi

论文摘要

数据增强在大型神经网络的培训中很受欢迎；但是，目前，关于如何使用增强数据的不同算法选择之间尚无明确的理论比较。在本文中，我们朝这个方向迈出了一步 - 我们首先提出了具有标签不变增强的线性回归的简单新颖分析，这表明数据增强一致性（DAC）本质上比对增强数据（DA -DA -MER）的经验风险最小化更为有效。然后将分析扩展到误指定的增强（即更改标签的增强），这再次证明了DAC比DA-MERM的优点。此外，我们将分析扩展到非线性模型（例如神经网络），并呈现概括范围。最后，我们使用CIFAR-100和WIDERESNET进行DAC和DA-MER之间的DAC和DA-ERM之间的简洁和苹果对比较（即没有额外的建模或数据调整）进行实验；这些共同证明了DAC的效果。

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data. In this paper, we take a step in this direction - we first present a simple and novel analysis for linear regression with label invariant augmentations, demonstrating that data augmentation consistency (DAC) is intrinsically more efficient than empirical risk minimization on augmented data (DA-ERM). The analysis is then extended to misspecified augmentations (i.e., augmentations that change the labels), which again demonstrates the merit of DAC over DA-ERM. Further, we extend our analysis to non-linear models (e.g., neural networks) and present generalization bounds. Finally, we perform experiments that make a clean and apples-to-apples comparison (i.e., with no extra modeling or data tweaks) between DAC and DA-ERM using CIFAR-100 and WideResNet; these together demonstrate the superior efficacy of DAC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题