深度学习模型中的盲目后门

论文标题

深度学习模型中的盲目后门

Blind Backdoors in Deep Learning Models

论文作者

Bagdasaryan, Eugene, Shmatikov, Vitaly

论文摘要

我们基于损害模型训练代码中的损失值计算，研究了一种将后门注入机器学习模型的新方法。我们使用它来证明与先前文献中的新类相比，本类的功能强大：ImageNet模型中的单像素和物理后门，将模型转换为秘密，隐私攻击任务的后门以及不需要推断时间输入修改的后门。我们的攻击是盲目的：攻击者无法修改训练数据，也不能观察他的代码执行，也不能访问由此产生的模型。攻击代码在模型正在训练时“即时”创建有毒的训练输入，并使用多目标优化来实现高准确性和后门任务。我们展示了盲目攻击如何逃避任何已知的防御并提出新的防御。

We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. The attack code creates poisoned training inputs "on the fly," as the model is training, and uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. We show how a blind attack can evade any known defense and propose new ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题