规划目标的立即地标，以跨代理商进行无模型技能转移

论文标题

规划目标的立即地标，以跨代理商进行无模型技能转移

Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents

论文作者

Liu, Minghuan, Zhu, Zhengbang, Zhu, Menghui, Zhuang, Yuzheng, Zhang, Weinan, Hao, Jianye

论文摘要

在诸如机器人技术之类的强化学习应用程序中，代理通常需要通过开发人员或物理限制指定不同状态/动作空间时处理各种输入/输出功能。这表明从头开始不必要的重新训练和相当大的样本效率低下，尤其是当代理遵循类似的解决方案步骤以实现任务时。在本文中，我们旨在转移类似的高级目标转变知识，以减轻挑战。具体来说，我们建议飞行员，即计划目标的立即地标。飞行员利用普遍的脱钩政策优化来学习目标条件的国家规划师；然后，将目标分配者提炼出来，以无模型风格的直接地标，可以在不同的代理之间共享。在我们的实验中，我们展示了飞行员在各种转移挑战方面的力量，包括从低维矢量状态到图像输入的各种动作空间和动力学的几乎没有传输，从简单的机器人到复杂的形态；我们还说明了一个从简单的2D导航任务到更难的蚂蚁迷宫任务的零射传输解决方案。

In reinforcement learning applications like robotics, agents usually need to deal with various input/output features when specified with different state/action spaces by their developers or physical restrictions. This indicates unnecessary re-training from scratch and considerable sample inefficiency, especially when agents follow similar solution steps to achieve tasks. In this paper, we aim to transfer similar high-level goal-transition knowledge to alleviate the challenge. Specifically, we propose PILoT, i.e., Planning Immediate Landmarks of Targets. PILoT utilizes the universal decoupled policy optimization to learn a goal-conditioned state planner; then, distills a goal-planner to plan immediate landmarks in a model-free style that can be shared among different agents. In our experiments, we show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics, from low-dimensional vector states to image inputs, from simple robot to complicated morphology; and we also illustrate a zero-shot transfer solution from a simple 2D navigation task to the harder Ant-Maze task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题