预测信息可以加速RL的学习

论文标题

预测信息可以加速RL的学习

Predictive Information Accelerates Learning in RL

论文作者

Lee, Kuang-Huei, Fischer, Ian, Liu, Anthony, Guo, Yijie, Lee, Honglak, Canny, John, Guadarrama, Sergio

论文摘要

预测信息是过去与未来之间的共同信息，i（x_past; x_future）。我们假设捕获预测信息在RL中很有用，因为建模接下来发生的事情的能力对于许多任务成功是必要的。为了检验我们的假设，我们使用有条件的熵瓶颈（CEB）物镜的对比版本来学习RL环境动力学的预测信息的压缩信息，从而从像素中训练软批评者（SAC）代理。我们将其称为预测信息SAC（PI-SAC）代理。我们表明，PI-SAC代理可以大大提高样品效率，而不是连续控制环境的DM控制套件的挑战基线。我们通过与未压缩的Pi-SAC剂，其他压缩和未压缩的剂以及直接从像素训练的SAC剂进行比较来评估PI-SAC剂。我们的实施是在Github上给出的。

The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics using a contrastive version of the Conditional Entropy Bottleneck (CEB) objective. We refer to these as Predictive Information SAC (PI-SAC) agents. We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels. Our implementation is given on GitHub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题