论文标题
迭代视觉和语言导航
Iterative Vision-and-Language Navigation
论文作者
论文摘要
我们介绍了迭代视觉和语言导航(IVLN),这是一种评估随着时间时间在持续环境中导航的语言引导的代理的范式。现有的视觉和语言导航(VLN)基准在每个情节的开头消除了代理的内存,测试了执行冷启动导航的能力而没有事先信息。但是,部署的机器人长时间占据了相同的环境。 IVLN范式通过训练和评估VLN代理来解决这种差异,这些VLN代理在各场景中保持记忆力,该场景包括最多100个有序的指令遵循室内的室内(R2R)情节,每个情节由单个语言指令和目标路径定义。我们提出离散且连续的迭代室对房间(IR2R)基准,包括80个室内场景中的约400次旅行。我们发现,扩展高性能变压器VLN代理的隐式内存不足以IVLN,但是构建地图的代理可以从环境持久性中受益,从而激发VLN中的映射构建代理的重新关注。
We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time. Existing Vision-and-Language Navigation (VLN) benchmarks erase the agent's memory at the beginning of every episode, testing the ability to perform cold-start navigation with no prior information. However, deployed robots occupy the same environment for long periods of time. The IVLN paradigm addresses this disparity by training and evaluating VLN agents that maintain memory across tours of scenes that consist of up to 100 ordered instruction-following Room-to-Room (R2R) episodes, each defined by an individual language instruction and a target path. We present discrete and continuous Iterative Room-to-Room (IR2R) benchmarks comprising about 400 tours each in 80 indoor scenes. We find that extending the implicit memory of high-performing transformer VLN agents is not sufficient for IVLN, but agents that build maps can benefit from environment persistence, motivating a renewed focus on map-building agents in VLN.