论文标题
X-Paste:重新访问可扩展的复制 - 例如使用剪辑和stablediffusion进行分割
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
论文作者
论文摘要
复制 - 帕斯特是一种简单有效的数据增强策略,例如分割。通过将对象实例随机粘贴到新的背景图像上,它可以免费创建新的培训数据,并显着提高细分性能,尤其是对于稀有对象类别。尽管在复制纸上使用的多样化,高质量的对象实例会带来更多的性能增长,但以前的工作利用了对对象实例的实例,要么是从人类通知实例细分数据集或从3D对象模型中渲染的对象实例,而且两种方法都太昂贵了,无法扩展以至于无法获得良好的多样性。在本文中,我们以新出现的零照片识别模型(例如剪辑)和文本2图模型(例如,stablediffusion)的功能进行大规模重新审视复制。我们首次证明,使用Text2Image模型生成图像或零照片识别模型来过滤不同对象类别的噪声爬行的图像是使复制模型真正可扩展的可行方法。 To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5蒙版AP在长尾类别上。
Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste.