论文标题

部分可观测时空混沌系统的无模型预测

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

论文作者

Mouselinos, Spyridon, Michalewski, Henryk, Malinowski, Mateusz

论文摘要

我们如何衡量智力系统的推理能力?视觉问题回答为通过有关场景的问题询问模型,为测试模型的能力提供了方便的框架。但是,尽管有许多视觉质量检查数据集和体系结构有时甚至会产生超人的性能,但这些架构是否实际上可以推理出来的问题仍在辩论中。为了回答这一点,我们扩展了视觉问题回答框架,并以两人游戏的形式提出以下行为测试。我们考虑CLEVR的黑盒神经模型。这些模型经过诊断数据集基准推理的培训。接下来,我们训练一名对抗性玩家,该玩家重新配置了场景以欺骗CLEVR模型。我们表明,否则可以在人类层面上执行的CLEVR模型很容易被我们的经纪人欺骗。我们的结果疑问,数据驱动的方法是否可以在不利用这些数据集中经常存在的众多偏见的情况下进行推理。最后,我们还提出了一个受控的实验,以测量此类模型学习和执行推理的效率。

How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However, despite scores of various visual QA datasets and architectures, which sometimes yield even a super-human performance, the question of whether those architectures can actually reason remains open to debate. To answer this, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game. We consider black-box neural models of CLEVR. These models are trained on a diagnostic dataset benchmarking reasoning. Next, we train an adversarial player that re-configures the scene to fool the CLEVR model. We show that CLEVR models, which otherwise could perform at a human level, can easily be fooled by our agent. Our results put in doubt whether data-driven approaches can do reasoning without exploiting the numerous biases that are often present in those datasets. Finally, we also propose a controlled experiment measuring the efficiency of such models to learn and perform reasoning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源