论文标题
分析视力和语言导航到看不见的室外区域的概括
Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas
论文作者
论文摘要
视觉和语言导航(VLN)是一项具有挑战性的视觉上的语言理解任务。鉴于自然语言导航指令,视觉代理与配备全景图像的基于图的环境进行交互,并试图遵循所描述的路线。大多数先前的工作都是在室内场景中进行的,在室内场景中,在与训练路线相似的路线上获得了最佳结果,在看不见的环境上测试时性能下降。我们专注于室外场景中的VLN,并发现与室内VLN相反,在看不见的数据上,户外VLN中的大部分增益归功于诸如相应环境图表的连接类型嵌入或标题delta之类的功能,而图像信息则在将VLN的VLN扮演很小的角色中扮演了很小的角色,从而将VLN概括为未看到的外观外部外观外观。这些发现表明,城市环境的图表的细节有偏见,要求VLN任务的规模和地理环境的多样性增长。
Vision and language navigation (VLN) is a challenging visually-grounded language understanding task. Given a natural language navigation instruction, a visual agent interacts with a graph-based environment equipped with panorama images and tries to follow the described route. Most prior work has been conducted in indoor scenarios where best results were obtained for navigation on routes that are similar to the training routes, with sharp drops in performance when testing on unseen environments. We focus on VLN in outdoor scenarios and find that in contrast to indoor VLN, most of the gain in outdoor VLN on unseen data is due to features like junction type embedding or heading delta that are specific to the respective environment graph, while image information plays a very minor role in generalizing VLN to unseen outdoor areas. These findings show a bias to specifics of graph representations of urban environments, demanding that VLN tasks grow in scale and diversity of geographical environments.