REL3D：3D中的空间关系的最小对比基准

论文标题

REL3D：3D中的空间关系的最小对比基准

Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D

论文作者

Goyal, Ankit, Yang, Kaiyu, Yang, Dawei, Deng, Jia

论文摘要

了解视觉输入中的空间关系（例如，“桌上的笔记本电脑”）对于人类和机器人都很重要。现有的数据集不足，因为它们缺乏大规模，高质量的3D地面真相信息，这对于学习空间关系至关重要。在本文中，我们通过构建REL3D：第一个大规模的，人类通知的数据集来填补这一空白，以在3D中接地空间关系。 REL3D可以量化3D信息在预测大规模人类数据的空间关系方面的有效性。此外，我们提出了最小的对比数据收集 - 一种用于减少数据集偏差的新型众包方法。我们数据集中的3D场景以最小对比的对形成：一对中的两个场景几乎相同，但是空间关系成立于一个场景，而另一个场景则失败。我们从经验上验证了最小对比的示例可以诊断具有当前关系检测模型的问题，并导致样品效率训练。代码和数据可在https://github.com/princeton-vl/rel3d上找到。

Understanding spatial relations (e.g., "laptop on table") in visual input is important for both humans and robots. Existing datasets are insufficient as they lack large-scale, high-quality 3D ground truth information, which is critical for learning spatial relations. In this paper, we fill this gap by constructing Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness of 3D information in predicting spatial relations on large-scale human data. Moreover, we propose minimally contrastive data collection -- a novel crowdsourcing method for reducing dataset bias. The 3D scenes in our dataset come in minimally contrastive pairs: two scenes in a pair are almost identical, but a spatial relation holds in one and fails in the other. We empirically validate that minimally contrastive examples can diagnose issues with current relation detection models as well as lead to sample-efficient training. Code and data are available at https://github.com/princeton-vl/Rel3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题