论文标题
对抗性新兵训练营:一个时代的标签免费认证鲁棒性
Adversarial Boot Camp: label free certified robustness in one epoch
论文作者
论文摘要
机器学习模型容易受到对抗攻击的影响。解决此脆弱性的一种方法是认证,该认证的重点是保证对给定扰动大小的模型。最近经过认证的模型的缺点是它们是随机的:它们需要多个计算昂贵的模型评估,并将随机噪声添加到给定的输入中。在我们的工作中,我们提出了一种确定性认证方法,该方法可实现确切的强大模型。这种方法基于特定正规损失的训练与高斯平均值的预期值之间的等效性。我们通过在不使用标签信息的情况下对一个损失进行此损失的模型来重新培训模型,从而实现了Imagenet-1k的认证模型。
Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.