论文标题
将艺术家摇滚:将后门注入文本编码器中的文本对图像综合
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
论文作者
论文摘要
虽然目前在研究人员和公众中综合文本形象的综合在很大程度上颇受人们的欢迎,但到目前为止,这些模型的安全已被忽略。许多文本指导的图像生成模型都依赖于外部来源的预训练的文本编码器,其用户相信所检索的模型将按照承诺的方式行事。不幸的是,情况并非如此。我们引入了针对文本引导的生成模型的后门攻击,并证明他们的文本编码构成了主要的篡改风险。我们的攻击仅稍微改变编码器,因此对于带有干净提示的图像世代,没有可疑的模型行为。到那时,将单个字符触发触发到提示中,例如非拉丁角色或表情符号,对手可以触发模型,以生成具有预定义属性的图像,或者在隐藏的,潜在的恶意描述之后,或者是图像。我们从经验上证明了我们对稳定扩散的攻击的效率很高,并强调了单个后门的注射过程不到两分钟。除了将我们的方法简单地作为攻击之外,它还可以迫使编码器忘记与某些概念(例如裸体或暴力)相关的短语,并帮助使图像生成更安全。
While text-to-image synthesis currently enjoys great popularity among researchers and the general public, the security of these models has been neglected so far. Many text-guided image generation models rely on pre-trained text encoders from external sources, and their users trust that the retrieved models will behave as promised. Unfortunately, this might not be the case. We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts. By then inserting a single character trigger into the prompt, e.g., a non-Latin character or emoji, the adversary can trigger the model to either generate images with pre-defined attributes or images following a hidden, potentially malicious description. We empirically demonstrate the high effectiveness of our attacks on Stable Diffusion and highlight that the injection process of a single backdoor takes less than two minutes. Besides phrasing our approach solely as an attack, it can also force an encoder to forget phrases related to certain concepts, such as nudity or violence, and help to make image generation safer.