论文标题
在分配变化中认证模型的准确性
Certifying Model Accuracy under Distribution Shifts
论文作者
论文摘要
机器学习中的认证鲁棒性主要集中在数据分布中每个点的固定攻击预算的输入的对抗扰动上。在这项工作中,我们提供了可证明的鲁棒性确保数据分布的有限瓦斯汀变化下的模型的准确性。我们表明,一个简单的过程将模型在转换空间内随机输入的输入与转换下的分布变化非常强大。我们的框架允许基准特异性的扰动大小在输入分布的不同点上变化,并且足以包括固定尺寸的扰动。我们的证书为模型的性能提供了保证的下限,用于在原始分布周围的Wasserstein Ball中输入分布的任何(自然或对抗性)变化。我们将技术应用于:(i)证明对图像的自然(非对抗性)转换的鲁棒性,例如颜色转移,色相变化和亮度和饱和度的变化,(ii)证明对对抗分布的对抗性转移的鲁棒性以及(iii)对对抗性的较低(iii)表现出可预期的模型(硬度),该模型与该模型有关,该模型与该模型有关,该模型在这种模型中均被限制为“无与伦比”。 训练。
Certified robustness in machine learning has primarily focused on adversarial perturbations of the input with a fixed attack budget for each point in the data distribution. In this work, we present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation. Our framework allows the datum-specific perturbation size to vary across different points in the input distribution and is general enough to include fixed-sized perturbations as well. Our certificates produce guaranteed lower bounds on the performance of the model for any (natural or adversarial) shift of the input distribution within a Wasserstein ball around the original distribution. We apply our technique to: (i) certify robustness against natural (non-adversarial) transformations of images such as color shifts, hue shifts and changes in brightness and saturation, (ii) certify robustness against adversarial shifts of the input distribution, and (iii) show provable lower bounds (hardness results) on the performance of models trained on so-called "unlearnable" datasets that have been poisoned to interfere with model training.