训练集清洁后门中毒通过自我监督的代表性学习

论文标题

训练集清洁后门中毒通过自我监督的代表性学习

Training set cleansing of backdoor poisoning by self-supervised representation learning

论文作者

Wang, H., Karami, S., Dia, O., Ritter, H., Emamjomeh-Zadeh, E., Chen, J., Xiang, Z., Miller, D. J., Kesidis, G.

论文摘要

后门或特洛伊木马攻击是针对深神经网络（DNN）分类器的一种重要类型的数据中毒攻击，其中训练数据集被少数样品中毒，每个样本都具有后门图案（通常是不可察觉或无效的模式），并且对攻击者的目标类别误以为是错误的。当在后门备件的数据集上接受训练时，DNN在大多数良性测试样本上通常行为，但是当测试样本具有后门模式（即包含后门触发器）时，对目标类做出了错误的预测。在这里，我们专注于图像分类任务，并表明监督培训可能在后门模式和关联的目标类之间建立更强的关联，而不是正常功能和真实的原点类别。相比之下，自我监督的表示学习忽略了样本的标签，并根据图像的语义内容学习嵌入的功能。％因此，我们建议使用无监督的表示学习，以避免强调后门毒理的训练样本，并学习对同一类样本的类似功能嵌入。使用自我监督表示学习发现的功能嵌入，开发了一种结合样品过滤和重新标记的数据清洁方法。 CIFAR-10基准数据集的实验表明，我们的方法在缓解后门攻击方面实现了最先进的性能。

A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DNN behaves normally on most benign test samples but makes incorrect predictions to the target class when the test sample has the backdoor pattern incorporated (i.e., contains a backdoor trigger). Here we focus on image classification tasks and show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. By contrast, self-supervised representation learning ignores the labels of samples and learns a feature embedding based on images' semantic content. %We thus propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class. Using a feature embedding found by self-supervised representation learning, a data cleansing method, which combines sample filtering and re-labeling, is developed. Experiments on CIFAR-10 benchmark datasets show that our method achieves state-of-the-art performance in mitigating backdoor attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题