使用变压器进行视觉和语言导航的拓扑规划

论文标题

使用变压器进行视觉和语言导航的拓扑规划

Topological Planning with Transformers for Vision-and-Language Navigation

论文作者

Chen, Kevin, Chen, Junshen K., Chuang, Jo, Vázquez, Marynel, Savarese, Silvio

论文摘要

传统的视觉和语言导航方法（VLN）是训练有素的端到端，但在自由遍布的环境中努力表现良好。受机器人界的启发，我们提出了使用拓扑图的模块化方法来进行VLN。鉴于自然语言指导和拓扑图，我们的方法利用注意机制预测地图中的导航计划。然后使用强大的控制器使用低级动作（例如向前旋转）执行该计划。实验表明，我们的方法的表现优于先前的端到端方法，生成可解释的导航计划，并表现出智能行为，例如回溯。

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题