论文标题
双优先状态回收经验重播
Double Prioritized State Recycled Experience Replay
论文作者
论文摘要
经验重播使在线加强学习者能够存储和重复与环境互动的先前经验。在原始方法中,经验被随机采样并均匀地重播。在优先考虑经验的地方开发了一项名为“优先经验重播”的工作,以便重播体验似乎更加重要。在本文中,我们开发了一种称为双优先化状态重新循环(DPSR)经验重播的方法,优先考虑训练阶段和存储阶段的经验,并通过国家回收利用来代替记忆中的经验,以获得最佳的经验,这些经验似乎暂时具有优先级别的优先级。我们在Deep Q-Networks(DQN)中使用了此方法,并取得了最先进的结果,优于原始方法,并在许多Atari游戏中重播了优先的经验。
Experience replay enables online reinforcement learning agents to store and reuse the previous experiences of interacting with the environment. In the original method, the experiences are sampled and replayed uniformly at random. A prior work called prioritized experience replay was developed where experiences are prioritized, so as to replay experiences seeming to be more important more frequently. In this paper, we develop a method called double-prioritized state-recycled (DPSR) experience replay, prioritizing the experiences in both training stage and storing stage, as well as replacing the experiences in the memory with state recycling to make the best of experiences that seem to have low priorities temporarily. We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result, outperforming the original method and prioritized experience replay on many Atari games.