论文标题
通过双重模仿学习转移层次结构
Transfering Hierarchical Structure with Dual Meta Imitation Learning
论文作者
论文摘要
层次模仿学习(HIL)是机器人从未分段的示范中学习子技能的有效方法。但是,学到的分层结构缺乏跨多任务或新任务转移的机制,这使得他们在面对新情况时必须从头开始学习。转移和重组模块化亚技能需要整个分层结构的快速适应能力。在这项工作中,我们提出了双重元模仿学习(DMIL),这是一种层次模仿学习方法,其中高级网络和子技能是通过模型 - 静态元学习的迭代元学习。 DMIL使用每个子技能中的国家行动对的可能性作为高级网络适应的监督,并使用改编的高级网络来确定每个子技能适应性的不同数据集。从理论上讲,我们证明了DMIL的迭代训练过程的收敛性,并建立了DMIL与期望最大化算法之间的联系。从经验上讲,我们在Meta-World \ cite {Metaworld}基准和厨房环境长期任务的竞争成果上实现了最新的模仿学习表现。
Hierarchical Imitation Learning (HIL) is an effective way for robots to learn sub-skills from long-horizon unsegmented demonstrations. However, the learned hierarchical structure lacks the mechanism to transfer across multi-tasks or to new tasks, which makes them have to learn from scratch when facing a new situation. Transferring and reorganizing modular sub-skills require fast adaptation ability of the whole hierarchical structure. In this work, we propose Dual Meta Imitation Learning (DMIL), a hierarchical meta imitation learning method where the high-level network and sub-skills are iteratively meta-learned with model-agnostic meta-learning. DMIL uses the likelihood of state-action pairs from each sub-skill as the supervision for the high-level network adaptation, and use the adapted high-level network to determine different data set for each sub-skill adaptation. We theoretically prove the convergence of the iterative training process of DMIL and establish the connection between DMIL and Expectation-Maximization algorithm. Empirically, we achieve state-of-the-art few-shot imitation learning performance on the Meta-world \cite{metaworld} benchmark and competitive results on long-horizon tasks of Kitchen environments.