通过双重模仿学习转移层次结构

论文标题

通过双重模仿学习转移层次结构

Transfering Hierarchical Structure with Dual Meta Imitation Learning

论文作者

Gao, Chongkai, Jiang, Yizhou, Chen, Feng

论文摘要

层次模仿学习（HIL）是机器人从未分段的示范中学习子技能的有效方法。但是，学到的分层结构缺乏跨多任务或新任务转移的机制，这使得他们在面对新情况时必须从头开始学习。转移和重组模块化亚技能需要整个分层结构的快速适应能力。在这项工作中，我们提出了双重元模仿学习（DMIL），这是一种层次模仿学习方法，其中高级网络和子技能是通过模型 - 静态元学习的迭代元学习。 DMIL使用每个子技能中的国家行动对的可能性作为高级网络适应的监督，并使用改编的高级网络来确定每个子技能适应性的不同数据集。从理论上讲，我们证明了DMIL的迭代训练过程的收敛性，并建立了DMIL与期望最大化算法之间的联系。从经验上讲，我们在Meta-World \ cite {Metaworld}基准和厨房环境长期任务的竞争成果上实现了最新的模仿学习表现。

Hierarchical Imitation Learning (HIL) is an effective way for robots to learn sub-skills from long-horizon unsegmented demonstrations. However, the learned hierarchical structure lacks the mechanism to transfer across multi-tasks or to new tasks, which makes them have to learn from scratch when facing a new situation. Transferring and reorganizing modular sub-skills require fast adaptation ability of the whole hierarchical structure. In this work, we propose Dual Meta Imitation Learning (DMIL), a hierarchical meta imitation learning method where the high-level network and sub-skills are iteratively meta-learned with model-agnostic meta-learning. DMIL uses the likelihood of state-action pairs from each sub-skill as the supervision for the high-level network adaptation, and use the adapted high-level network to determine different data set for each sub-skill adaptation. We theoretically prove the convergence of the iterative training process of DMIL and establish the connection between DMIL and Expectation-Maximization algorithm. Empirically, we achieve state-of-the-art few-shot imitation learning performance on the Meta-world \cite{metaworld} benchmark and competitive results on long-horizon tasks of Kitchen environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题