通过语言说明进行元提升学习

论文标题

通过语言说明进行元提升学习

Meta-Reinforcement Learning via Language Instructions

论文作者

Bing, Zhenshan, Koch, Alexander, Yao, Xiangtong, Huang, Kai, Knoll, Alois

论文摘要

尽管最近的强化学习最近在学习复杂的行为方面非常成功，但它需要大量的数据才能学习任务。引起这种局限性的基本原因之一在于，强化学习的试验和错误学习范式的性质，在该学习中，代理商与环境进行交流并在学习中进行进步仅依靠奖励信号。这是隐性的，不足以很好地学习任务。相反，通常通过自然语言说明教授人类。利用语言指示进行机器人运动控制来提高适应性，这是一个最近出现的主题和具有挑战性。在本文中，我们提出了一种元素算法，该算法通过多个操纵任务中的语言说明来解决学习技能的挑战。一方面，我们的算法利用语言指令来塑造其对任务的解释，另一方面，它仍然学会了在试用过程中解决任务。我们在机器人操纵基准（元世界）上评估了算法，并且在培训和测试任务成功率方面，它的表现明显优于最先进的方法。代码可在\ url {https://tumi6robot.wixsite.com/million}中获得。

Although deep reinforcement learning has recently been very successful at learning complex behaviors, it requires a tremendous amount of data to learn a task. One of the fundamental reasons causing this limitation lies in the nature of the trial-and-error learning paradigm of reinforcement learning, where the agent communicates with the environment and progresses in the learning only relying on the reward signal. This is implicit and rather insufficient to learn a task well. On the contrary, humans are usually taught new skills via natural language instructions. Utilizing language instructions for robotic motion control to improve the adaptability is a recently emerged topic and challenging. In this paper, we present a meta-RL algorithm that addresses the challenge of learning skills with language instructions in multiple manipulation tasks. On the one hand, our algorithm utilizes the language instructions to shape its interpretation of the task, on the other hand, it still learns to solve task in a trial-and-error process. We evaluate our algorithm on the robotic manipulation benchmark (Meta-World) and it significantly outperforms state-of-the-art methods in terms of training and testing task success rates. Codes are available at \url{https://tumi6robot.wixsite.com/million}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题