基于图像增强的动量记忆内在奖励视觉场景的固有奖励

论文标题

基于图像增强的动量记忆内在奖励视觉场景的固有奖励

Image Augmentation Based Momentum Memory Intrinsic Reward for Sparse Reward Visual Scenes

论文作者

Fang, Zheng, Zhao, Biao, Liu, Guizhong

论文摘要

现实生活中的许多场景都可以抽象成稀疏的奖励视觉场景，在仅接受图像和稀疏奖励的条件下，代理很难解决任务。我们建议将这个问题分解为两个子问题：视觉表示和稀疏的奖励。为了解决这些问题，提出了一个新颖的框架，将自我监督的表示学习与内在动机相结合。对于视觉表示，获得了成像前向动力学和奖励的组合所驱动的表示。对于稀疏的奖励，设计了一种新型的内在奖励，即动量记忆内在奖励（MMIR）。它利用了与当前模型（在线网络）和历史模型（目标网络）的输出的差异来介绍代理的状态熟悉度。我们的方法在Vizdoom中的稀疏奖励上对视觉导航任务进行了评估。实验表明，我们的方法在样本效率下达到了最先进的性能，比现有方法达到100％成功率的现有方法的速度至少要快2倍。

Many scenes in real life can be abstracted to the sparse reward visual scenes, where it is difficult for an agent to tackle the task under the condition of only accepting images and sparse rewards. We propose to decompose this problem into two sub-problems: the visual representation and the sparse reward. To address them, a novel framework IAMMIR combining the self-supervised representation learning with the intrinsic motivation is presented. For visual representation, a representation driven by a combination of the imageaugmented forward dynamics and the reward is acquired. For sparse rewards, a new type of intrinsic reward is designed, the Momentum Memory Intrinsic Reward (MMIR). It utilizes the difference of the outputs from the current model (online network) and the historical model (target network) to present the agent's state familiarity. Our method is evaluated on the visual navigation task with sparse rewards in Vizdoom. Experiments demonstrate that our method achieves the state of the art performance in sample efficiency, at least 2 times faster than the existing methods reaching 100% success rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题