关于在潜在空间中定义替代分布的好处

论文标题

关于在潜在空间中定义替代分布的好处

On the benefits of defining vicinal distributions in latent space

论文作者

Mangla, Puneet, Singh, Vedant, Havaldar, Shreyas Jayant, Balasubramanian, Vineeth N

论文摘要

附近风险最小化（VRM）原理是一种经验风险最小化（ERM）变体，它用牛皮函数代替了狄拉克质量。有强烈的数值和理论证据表明，如果选择适当的替代功能，则VRM在概括方面优于ERM。混合训练（MT）是综合分布的一种流行选择，通过在训练示例之间引入全球线性行为来提高模型的概括性能。除了概括之外，最近的作品表明，经过混合训练的模型对输入扰动/损坏相对强大，同时校准的校准比非混合式损坏更好。在这项工作中，我们调查了定义这些替代分布的好处，例如在生成模型的潜在空间中而不是在输入空间本身中定义混音。我们提出了一种新方法 - \ textIt {varmixup（差异混合）} - 通过使用数据的潜在歧管来更好地采样混合图像。我们对CIFAR-10，CIFAR-100和TININ-IMAGENET的实证研究表明，通过在VAE学到的潜在歧管中进行混合训练的模型对各种输入腐败/扰动的本质上更加可靠，并且可以更好地校准，并且表现出更多的位置损失损失景观。

The vicinal risk minimization (VRM) principle is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. There is strong numerical and theoretical evidence showing that VRM outperforms ERM in terms of generalization if appropriate vicinal functions are chosen. Mixup Training (MT), a popular choice of vicinal distribution, improves the generalization performance of models by introducing globally linear behavior in between training examples. Apart from generalization, recent works have shown that mixup trained models are relatively robust to input perturbations/corruptions and at the same time are calibrated better than their non-mixup counterparts. In this work, we investigate the benefits of defining these vicinal distributions like mixup in latent space of generative models rather than in input space itself. We propose a new approach - \textit{VarMixup (Variational Mixup)} - to better sample mixup images by using the latent manifold underlying the data. Our empirical studies on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that models trained by performing mixup in the latent manifold learned by VAEs are inherently more robust to various input corruptions/perturbations, are significantly better calibrated, and exhibit more local-linear loss landscapes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题