通过基于示例的重置自动化加强学习

论文标题

通过基于示例的重置自动化加强学习

Automating Reinforcement Learning with Example-based Resets

论文作者

Kim, Jigang, Park, J. hyeon, Cho, Daesol, Kim, H. Jin

论文摘要

深厚的强化学习使机器人能够从最少到没有先验知识的环境互动中学习运动技能。但是，现有的强化学习算法假定了一个情节设置，其中代理将每个情节结束时重置为固定的初始状态分布，以成功地从重复试验中训练代理。这种重置机制虽然对模拟任务的微不足道，但要为实现现实世界的机器人技术任务提供挑战。机器人系统中的重置通常需要广泛的人类监督和特定于任务的解决方案，这与自动驾驶机器人学习的目标相矛盾。在本文中，我们通过引入一个学会以自我监督的方式重置的额外代理，将传统强化学习扩展到更大的自主权。重置代理先发制人会触发重置以防止手动重置并隐式为前向代理施加课程。我们采用我们的方法来学习一套模拟和现实世界连续控制任务的套件，并证明重置代理成功地学习了减少手动重置，同时还允许远期策略随着时间的推移逐渐改善。

Deep reinforcement learning has enabled robots to learn motor skills from environmental interactions with minimal to no prior knowledge. However, existing reinforcement learning algorithms assume an episodic setting, in which the agent resets to a fixed initial state distribution at the end of each episode, to successfully train the agents from repeated trials. Such reset mechanism, while trivial for simulated tasks, can be challenging to provide for real-world robotics tasks. Resets in robotic systems often require extensive human supervision and task-specific workarounds, which contradicts the goal of autonomous robot learning. In this paper, we propose an extension to conventional reinforcement learning towards greater autonomy by introducing an additional agent that learns to reset in a self-supervised manner. The reset agent preemptively triggers a reset to prevent manual resets and implicitly imposes a curriculum for the forward agent. We apply our method to learn from scratch on a suite of simulated and real-world continuous control tasks and demonstrate that the reset agent successfully learns to reduce manual resets whilst also allowing the forward policy to improve gradually over time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题