论文标题

优化文本到图像生成的提示

Optimizing Prompts for Text-to-Image Generation

论文作者

Hao, Yaru, Chi, Zewen, Dong, Li, Wei, Furu

论文摘要

精心设计的提示可以指导文本到图像模型以生成惊人的图像。但是,性能提示通常是特定于模型的,并且与用户输入未对准。我们提出了迅速改编,而不是艰苦的人工工程,这是一个通用框架,它会自动调整原始用户输入到模型偏爱的提示。具体而言,我们首先在一小少的手动工程提示中使用验证的语言模型进行了监督的微调。然后,我们使用加强学习来探索更好的提示。我们定义了一个奖励功能,该奖励功能鼓励该政策在保留原始用户意图的同时生成更美观的图像。稳定扩散的实验结果表明,就自动指标和人类偏好等级而言,我们的方法优于手动及时工程。此外,加强学习进一步提高了性能,尤其是在室外提示上。验证的检查点可在https://aka.ms/promptist上找到。可以在https://aka.ms/promptist-demo上找到该演示。

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源