论文标题
泡沫:视觉和语言导航的追随者意识到的扬声器模型
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
论文作者
论文摘要
事实证明,演讲者的追随者模型在视觉和语言导航中有效,在该导航中,扬声器模型用于合成新的说明,以增强追随者导航模型的培训数据。但是,在以前的许多方法中,生成的指令未直接训练以优化追随者的性能。在本文中,我们介绍\ textsc {foam},a \ textsc {fo} llower- \ textsc {a} Ware Speaker \ textsc {M} ODEL经过遵循追随者的反馈的不断更新,因此生成的指令可以更适合追随者的当前学习状态。具体而言,我们使用双层优化框架优化说话者,并通过评估标记数据的跟随器来获得其训练信号。房间对房间和房间 - 房间数据集的实验结果表明,我们的方法可以超过跨设置的强大基线模型。分析还表明,我们生成的说明的质量比基线更高。
The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation model. However, in many of the previous methods, the generated instructions are not directly trained to optimize the performance of the follower. In this paper, we present \textsc{foam}, a \textsc{Fo}llower-\textsc{a}ware speaker \textsc{M}odel that is constantly updated given the follower feedback, so that the generated instructions can be more suitable to the current learning state of the follower. Specifically, we optimize the speaker using a bi-level optimization framework and obtain its training signals by evaluating the follower on labeled data. Experimental results on the Room-to-Room and Room-across-Room datasets demonstrate that our methods can outperform strong baseline models across settings. Analyses also reveal that our generated instructions are of higher quality than the baselines.