关于阿塔里2600场比赛的灾难性干扰

论文标题

关于阿塔里2600场比赛的灾难性干扰

On Catastrophic Interference in Atari 2600 Games

论文作者

Fedus, William, Ghosh, Dibya, Martin, John D., Bellemare, Marc G., Bengio, Yoshua, Larochelle, Hugo

论文摘要

无模型的深钢筋学习效率低下。一种假设 - 推测但未得到证实 - 是环境中的灾难性干扰抑制了学习。我们通过在街机学习环境（AL）中的大规模实证研究（实际上是找到支持证据）来检验这一假设。我们表明干扰导致高原的性能。如果不降低用于达到那里的政策，网络无法在高原以外的领域进行训练。通过合成控制干扰，我们演示了跨体系结构，学习算法和环境的性能提升。更精致的分析表明，学习游戏的一个部分通常会增加其他地方的预测错误。我们的研究提供了灾难性干扰与增强学习样本效率之间的明确经验联系。

Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interference within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network cannot train on segments beyond the plateau without degrading the policy used to reach there. By synthetically controlling for interference, we demonstrate performance boosts across architectures, learning algorithms and environments. A more refined analysis shows that learning one segment of a game often increases prediction errors elsewhere. Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题