论文标题
使用变压器进行视觉和语言导航的拓扑规划
Topological Planning with Transformers for Vision-and-Language Navigation
论文作者
论文摘要
传统的视觉和语言导航方法(VLN)是训练有素的端到端,但在自由遍布的环境中努力表现良好。受机器人界的启发,我们提出了使用拓扑图的模块化方法来进行VLN。鉴于自然语言指导和拓扑图,我们的方法利用注意机制预测地图中的导航计划。然后使用强大的控制器使用低级动作(例如向前旋转)执行该计划。实验表明,我们的方法的表现优于先前的端到端方法,生成可解释的导航计划,并表现出智能行为,例如回溯。
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.