论文标题
潜在空间的进化计划
Evolutionary Planning in Latent Space
论文作者
论文摘要
计划是一种强大的方法,可以通过几种理想的特性进行加强学习。但是,它需要一个世界模型,这在许多现实生活中不容易获得。在本文中,我们建议学习一种世界模型,该模型可以在潜在空间(EPLS)中进行进化计划。我们使用各种自动编码器(VAE)来学习单个观察值的压缩潜在表示,并扩展混合密度复发性神经网络(MDRNN),以学习可以用于计划的随机,多模式向前模型。我们使用随机突变爬山(RMHC)来找到一系列动作,在这个学到的世界模型中最大程度地提高预期奖励。我们通过从随机策略中推出的推出,并通过使用学识渊博的世界模型从越来越准确的计划策略进行迭代完善,演示了如何通过从随机策略中推出来构建世界模型。经过几次完善的迭代,我们的计划代理比没有标准的无模型强化学习方法更好地证明了我们方法的生存能力。
Planning is a powerful approach to reinforcement learning with several desirable properties. However, it requires a model of the world, which is not readily available in many real-life problems. In this paper, we propose to learn a world model that enables Evolutionary Planning in Latent Space (EPLS). We use a Variational Auto Encoder (VAE) to learn a compressed latent representation of individual observations and extend a Mixture Density Recurrent Neural Network (MDRNN) to learn a stochastic, multi-modal forward model of the world that can be used for planning. We use the Random Mutation Hill Climbing (RMHC) to find a sequence of actions that maximize expected reward in this learned model of the world. We demonstrate how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy using the learned world model. After a few iterations of this refinement, our planning agents are better than standard model-free reinforcement learning approaches demonstrating the viability of our approach.