论文标题
预测性per:平衡优先级和多样性,以稳定的深度强化学习
Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning
论文作者
论文摘要
优先的经验重播(PER)样本重要的过渡,而不是统一的过渡,以提高深度加固学习剂的性能。我们声称,这种优先级必须与样本多样性保持平衡,以使DQN稳定并防止遗忘。我们提出的改善PER(PPER)(PPER)需要三个对策(Tdinit,TDCLIP,TDPRED),以(i)消除优先级别的异常值和爆炸,以及(ii)提高样本多样性和分布,由优先级加权,均导致DQN稳定。这三个中最值得注意的是引入第二个DNN,称为TDPred,以概括分配优先级。消融研究和对Atari游戏的完整实验表明,每种对策都以自己的方式和PPER有助于成功增强稳定性,从而超过Per。
Prioritized experience replay (PER) samples important transitions, rather than uniformly, to improve the performance of a deep reinforcement learning agent. We claim that such prioritization has to be balanced with sample diversity for making the DQN stabilized and preventing forgetting. Our proposed improvement over PER, called Predictive PER (PPER), takes three countermeasures (TDInit, TDClip, TDPred) to (i) eliminate priority outliers and explosions and (ii) improve the sample diversity and distributions, weighted by priorities, both leading to stabilizing the DQN. The most notable among the three is the introduction of the second DNN called TDPred to generalize the in-distribution priorities. Ablation study and full experiments with Atari games show that each countermeasure by its own way and PPER contribute to successfully enhancing stability and thus performance over PER.