论文标题

数据增强一致性的样本效率

Sample Efficiency of Data Augmentation Consistency Regularization

论文作者

Yang, Shuo, Dong, Yijun, Ward, Rachel, Dhillon, Inderjit S., Sanghavi, Sujay, Lei, Qi

论文摘要

数据增强在大型神经网络的培训中很受欢迎;但是,目前,关于如何使用增强数据的不同算法选择之间尚无明确的理论比较。在本文中,我们朝这个方向迈出了一步 - 我们首先提出了具有标签不变增强的线性回归的简单新颖分析,这表明数据增强一致性(DAC)本质上比对增强数据(DA -DA -MER)的经验风险最小化更为有效。然后将分析扩展到误指定的增强(即更改标签的增强),这再次证明了DAC比DA-MERM的优点。此外,我们将分析扩展到非线性模型(例如神经网络),并呈现概括范围。最后,我们使用CIFAR-100和WIDERESNET进行DAC和DA-MER之间的DAC和DA-ERM之间的简洁和苹果对比较(即没有额外的建模或数据调整)进行实验;这些共同证明了DAC的效果。

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data. In this paper, we take a step in this direction - we first present a simple and novel analysis for linear regression with label invariant augmentations, demonstrating that data augmentation consistency (DAC) is intrinsically more efficient than empirical risk minimization on augmented data (DA-ERM). The analysis is then extended to misspecified augmentations (i.e., augmentations that change the labels), which again demonstrates the merit of DAC over DA-ERM. Further, we extend our analysis to non-linear models (e.g., neural networks) and present generalization bounds. Finally, we perform experiments that make a clean and apples-to-apples comparison (i.e., with no extra modeling or data tweaks) between DAC and DA-ERM using CIFAR-100 and WideResNet; these together demonstrate the superior efficacy of DAC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源