Procthor：大型使用程序生成体现了AI

论文标题

Procthor：大型使用程序生成体现了AI

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

论文作者

Deitke, Matt, VanderBilt, Eli, Herrasti, Alvaro, Weihs, Luca, Salvador, Jordi, Ehsani, Kiana, Han, Winson, Kolve, Eric, Farhadi, Ali, Kembhavi, Aniruddha, Mottaghi, Roozbeh

论文摘要

大量的数据集和高容量模型推动了计算机视觉和自然语言理解方面的许多最新进步。这项工作为在体现的AI中提供了类似的成功案例提供了一个平台。我们提出了Procthor，这是一个程序生成体现的AI环境的框架。 Procthor使我们能够对各种，交互式，可自定义和性能的虚拟环境进行任意大型数据集，以训练和评估跨导航，互动和操纵任务的体现代理。我们通过10,000个生成的房屋和简单的神经模型的样本来证明探测器的能力和潜力。仅在Procthor上仅使用RGB图像训练的模型，没有明确的映射，并且没有人为的任务监督在6个体现的AI基准测试中产生最先进的结果，以进行导航，重排和手臂操纵，包括目前运行的Habitat 2022，AI2，AI2-AI2-ARTRANGENT-THOR RECHRANGEMENT 2022222222，以及机器人的挑战。我们还通过对procthor进行预训练，在下游基准测试上没有进行微调，通常会击败以前的最先进的系统，以访问下游训练数据。

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题