积极的分层模仿和强化学习

论文标题

积极的分层模仿和强化学习

Active Hierarchical Imitation and Reinforcement Learning

论文作者

Niu, Yaru, Gu, Yijun

论文摘要

人类可以利用层次结构将任务分为子任务并有效地解决问题。模仿和强化学习或它们与分层结构的结合都被证明是机器人学习复杂任务具有稀疏奖励的有效方法。但是，在先前关于分层模仿和强化学习的工作中，经过测试的环境在相对简单的2D游戏中，并且动作空间是离散的。此外，许多模仿学习的工作重点是改善通过强化学习算法而不是人类专家对专家政策汲取的政策。在人类机器人互动的情况下，可能需要人类提供示范来教机器人，因此提高学习效率以减少专家的努力至关重要，并了解人类对学习/培训过程的看法。在这个项目中，我们探索了不同的模仿学习算法，并在我们开发的层次模仿和强化学习框架上设计了积极的学习算法。我们进行了一个实验，要求五名参与者指导一个随机初始化的代理到迷宫中的随机目标。我们的实验结果表明，使用匕首和基于奖励的主动学习方法可以实现更好的表现，同时在训练过程中在身体和精神上节省更多的人类努力。

Humans can leverage hierarchical structures to split a task into sub-tasks and solve problems efficiently. Both imitation and reinforcement learning or a combination of them with hierarchical structures have been proven to be an efficient way for robots to learn complex tasks with sparse rewards. However, in the previous work of hierarchical imitation and reinforcement learning, the tested environments are in relatively simple 2D games, and the action spaces are discrete. Furthermore, many imitation learning works focusing on improving the policies learned from the expert polices that are hard-coded or trained by reinforcement learning algorithms, rather than human experts. In the scenarios of human-robot interaction, humans can be required to provide demonstrations to teach the robot, so it is crucial to improve the learning efficiency to reduce expert efforts, and know human's perception about the learning/training process. In this project, we explored different imitation learning algorithms and designed active learning algorithms upon the hierarchical imitation and reinforcement learning framework we have developed. We performed an experiment where five participants were asked to guide a randomly initialized agent to a random goal in a maze. Our experimental results showed that using DAgger and reward-based active learning method can achieve better performance while saving more human efforts physically and mentally during the training process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题