通过级联网络的熵正规加固学习

论文标题

通过级联网络的熵正规加固学习

Entropy Regularized Reinforcement Learning with Cascading Networks

论文作者

Della Vecchia, Riccardo, Shilova, Alena, Preux, Philippe, Akrour, Riad

论文摘要

深度强化学习（DEEP RL）在高维问题上取得了令人难以置信的成就，但是即使在最简单的任务上，其学习过程仍然不稳定。 Deep RL使用神经网络作为函数近似器。这些神经模型在很大程度上受到（联合国）监督机器学习社区的发展的启发。与这些学习框架相比，RL的主要困难之一是I.I.D。数据。应对这种困难的一种方法是控制每次迭代的政策变化速度。在这项工作中，我们通过在每个策略更新中具有大小的神经模型来挑战（联合国）监督学习社区（联合国）监督学习社区的常见实践。这允许一个封闭的表格熵正规化策略更新，从而可以更好地控制每次迭代中策略的变化率，并有助于应对非I.I.D. RL的性质。与其他深层RL基准相比，对经典RL基准测试的初步实验显示出令人鼓舞的结果，在某些RL任务上具有显着的收敛性，同时表现出对其他RL的局限性。

Deep Reinforcement Learning (Deep RL) has had incredible achievements on high dimensional problems, yet its learning process remains unstable even on the simplest tasks. Deep RL uses neural networks as function approximators. These neural models are largely inspired by developments in the (un)supervised machine learning community. Compared to these learning frameworks, one of the major difficulties of RL is the absence of i.i.d. data. One way to cope with this difficulty is to control the rate of change of the policy at every iteration. In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture, by having a neural model that grows in size at each policy update. This allows a closed form entropy regularized policy update, which leads to a better control of the rate of change of the policy at each iteration and help cope with the non i.i.d. nature of RL. Initial experiments on classical RL benchmarks show promising results with remarkable convergence on some RL tasks when compared to other deep RL baselines, while exhibiting limitations on others.

下载PDF全文

下载文献需遵守相关版权规定

论文标题