学会与负担能力的多模式神经大满贯

论文标题

学会与负担能力的多模式神经大满贯

Learning to Act with Affordance-Aware Multimodal Neural SLAM

论文作者

Jia, Zhiwei, Lin, Kaixiang, Zhao, Yizhou, Gao, Qiaozi, Thattai, Govind, Sukhatme, Gaurav

论文摘要

近年来，见证了新兴的范式向体现的人工智能转变，在该范式上，代理商必须学会通过与环境互动来解决具有挑战性的任务。解决了具体的多模式任务，包括长马计划，视觉和语言接地和有效的探索，存在一些挑战。我们专注于关键瓶颈，即计划和导航的性能。为了应对这一挑战，我们提出了一种神经大满贯的方法，该方法首次利用几种方式进行探索，预测了负担能力感知的语义图，并同时计划。这大大提高了勘探效率，导致了稳健的长途计划，并实现了有效的视觉和语言基础。通过拟议的负担能力感知的多模式神经大满贯（AMSLAM）方法，我们比先前在Alfred Benchmark上发表的工作获得了40％以上的改善，并以23.48％的成功率在未看到的场景中取得了23.48％的新最先进的概括性能。

Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration. We focus on a critical bottleneck, namely the performance of planning and navigation. To tackle this challenge, we propose a Neural SLAM approach that, for the first time, utilizes several modalities for exploration, predicts an affordance-aware semantic map, and plans over it at the same time. This significantly improves exploration efficiency, leads to robust long-horizon planning, and enables effective vision-and-language grounding. With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23.48% on the test unseen scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题