论文标题

用于开放式录像带任务的预训练图像变压器

Pre-training image-language transformers for open-vocabulary tasks

论文作者

Piergiovanni, AJ, Kuo, Weicheng, Angelova, Anelia

论文摘要

我们为视觉和语言变压器模型提供了一种预训练方法,该方法基于各种任务的混合。我们探索了在预训练中使用图像文本字幕数据的使用,这不需要其他监督,也需要对象感知的策略来预先培训模型。我们评估了许多文本视觉+语言任务的方法,例如视觉问题答案,视觉效果和字幕,并证明了对标准预训练方法的巨大收益。

We present a pre-training approach for vision and language transformer models, which is based on a mixture of diverse tasks. We explore both the use of image-text captioning data in pre-training, which does not need additional supervision, as well as object-aware strategies to pre-train the model. We evaluate the method on a number of textgenerative vision+language tasks, such as Visual Question Answering, visual entailment and captioning, and demonstrate large gains over standard pre-training methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源