论文标题
计划通过自我监督的世界模型进行探索
Planning to Explore via Self-Supervised World Models
论文作者
论文摘要
强化学习允许解决复杂的任务,但是,学习往往是特定于任务的,样本效率仍然是一个挑战。我们提出了Plan2Explore,这是一种自制的强化学习推动者,通过一种新的方法来应对这两个挑战,以进行自我监督的探索和快速适应新任务,这在探索过程中不需要知道。在探索过程中,与先前的方法不同,这些方法回顾性地计算了代理商已经到达观察的新颖性,我们的代理人通过利用计划来寻找预期的未来新颖性来有效地采取行动。探索后,代理商以零或几次方式迅速适应了多个下游任务。我们评估了来自高维图像输入的挑战控制任务。如果没有任何培训监督或特定于任务的互动,Plan2开发就优于先前的自我监督探索方法,实际上,几乎与可以访问奖励的表演符合。 https://ramanans1.github.io/plan2explore/的视频和代码
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During exploration, unlike prior methods which retrospectively compute the novelty of observations after the agent has already reached them, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Videos and code at https://ramanans1.github.io/plan2explore/