Elsim：通过内在动机端到端学习可重复使用的技能

论文标题

Elsim：通过内在动机端到端学习可重复使用的技能

ELSIM: End-to-end learning of reusable skills through intrinsic motivation

论文作者

Aubret, Arthur, Matignon, Laetitia, Hassas, Salima

论文摘要

从发展学习中汲取灵感，我们提出了一种新颖的强化学习体系结构，该体系结构在层次上以端到端的方式代表自我生成的技能。借助这种体系结构，代理只着眼于任务奖励技能，同时保持技能的学习自下而上。这种自下而上的方法可以学习1-可以在任务中转移的技能，2-在奖励稀疏时改善探索。为此，我们将一个以前定义的共同信息目标与一种新颖的课程学习算法相结合，创造了无限且可探索的技能树。我们在简单的网格环境上测试我们的代理商，以了解和可视化代理如何区分其技能。然后，我们表明我们的方法可以扩展到更困难的穆约科环境，在这些环境中，我们的代理商能够建立技能的代表，当奖励稀疏时，这些技能可以改善转移学习和探索。

Taking inspiration from developmental learning, we present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way. With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up. This bottom-up approach allows to learn skills that 1- are transferable across tasks, 2- improves exploration when rewards are sparse. To do so, we combine a previously defined mutual information objective with a novel curriculum learning algorithm, creating an unlimited and explorable tree of skills. We test our agent on simple gridworld environments to understand and visualize how the agent distinguishes between its skills. Then we show that our approach can scale on more difficult MuJoCo environments in which our agent is able to build a representation of skills which improve over a baseline both transfer learning and exploration when rewards are sparse.

下载PDF全文

下载文献需遵守相关版权规定

论文标题