论文标题
来自具有大规模预训练的体现对话框的基于变压器的本地化
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
论文作者
论文摘要
我们通过具体的对话框(LED)解决了本地化的具有挑战性的任务。给定了来自两个代理的对话框,一个观察者在未知环境中导航和试图识别观察者位置的定位器,其目标是预测观察者在地图中的最终位置。我们开发了一种新颖的LED-BERT架构,并提出了有效的预处理策略。我们表明,基于图的场景表示比先前的工作中使用的自上而下的2D地图更有效。我们的方法表现优于以前的基线。
We address the challenging task of Localization via Embodied Dialog (LED). Given a dialog from two agents, an Observer navigating through an unknown environment and a Locator who is attempting to identify the Observer's location, the goal is to predict the Observer's final location in a map. We develop a novel LED-Bert architecture and present an effective pretraining strategy. We show that a graph-based scene representation is more effective than the top-down 2D maps used in prior works. Our approach outperforms previous baselines.