论文标题

QLEVR:定量语言和基本视觉推理的诊断数据集

QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning

论文作者

Li, Zechen, Søgaard, Anders

论文摘要

合成数据集已成功地用于探测视觉问题的避开数据集的推理能力。例如,CLEVR(JOHNSON2017CLEVR)测试了一系列视觉推理能力。 CLEVR中的问题集中于形状,颜色和大小,数值推理和存在主张的比较。本文介绍了一个微不足道的,诊断性的视觉问题避开数据集QLEVR,它超越了存在和数值量化,并专注于更复杂的量词及其组合,例如,询问图像中至少有两个以上的红球比至少三个蓝色球小。我们描述了如何创建数据集并对最先进的视觉提问模型进行了首次评估,这表明QLEVR对我们当前的模型提出了巨大的挑战。代码和数据集可从https://github.com/zechenli03/qlevr获得

Synthetic datasets have successfully been used to probe visual question-answering datasets for their reasoning abilities. CLEVR (johnson2017clevr), for example, tests a range of visual reasoning abilities. The questions in CLEVR focus on comparisons of shapes, colors, and sizes, numerical reasoning, and existence claims. This paper introduces a minimally biased, diagnostic visual question-answering dataset, QLEVR, that goes beyond existential and numerical quantification and focus on more complex quantifiers and their combinations, e.g., asking whether there are more than two red balls that are smaller than at least three blue balls in an image. We describe how the dataset was created and present a first evaluation of state-of-the-art visual question-answering models, showing that QLEVR presents a formidable challenge to our current models. Code and Dataset are available at https://github.com/zechenli03/QLEVR

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源