论文标题
损失功能与经验重播中的不均匀抽样之间的等效性
An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
论文作者
论文摘要
优先体验重播(PER)是一种深入的增强学习技术,在该技术中,代理从以不均匀的概率与时间差异误差成比例的过渡中学习。我们表明,使用不均匀采样数据评估的任何损耗函数都可以转化为具有相同预期梯度的另一个均匀采样损耗函数。令人惊讶的是,我们发现在某些环境中,可以完全由这种新的损失功能替换,而不会影响经验绩效。此外,这种关系提出了通过校正其均匀采样损耗函数等效的新的改进分支。我们证明了我们提出的对PER的修改的有效性以及在几个Mujoco和Atari环境中的等效损失函数。
Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-uniform probability proportionate to their temporal-difference error. We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function with the same expected gradient. Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance. Furthermore, this relationship suggests a new branch of improvements to PER by correcting its uniformly sampled loss function equivalent. We demonstrate the effectiveness of our proposed modifications to PER and the equivalent loss function in several MuJoCo and Atari environments.