论文标题
通过Terrain Transformer进行四足球运动的SIM到现实转移
Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer
论文作者
论文摘要
最近,通过训练物理模拟中的政策,然后将其转移到现实世界(即SIM到现实转移),深厚的强化学习是在多个地形上进行腿部运动的一种吸引人的替代方案。尽管取得了很大的进步,但传统神经网络的能力和可扩展性仍然有限,这可能会阻碍其在更复杂的环境中的应用。相比之下,变压器体系结构在广泛的大规模序列建模任务中表明了它的优势,包括自然语言处理和决策问题。在本文中,我们提出了Terrain Transformer(TERT),这是一个高容量的变压器模型,用于在各种地形上进行四足动力控制。此外,为了更好地利用SIM卡到现实的场景,我们提出了一个新颖的两阶段训练框架,该培训框架包括一个脱机预处理阶段和在线校正阶段,该阶段可以自然地将变形金刚与特权培训整合在一起。在模拟中进行的广泛实验表明,就回报,能耗和控制平滑度而言,TERT在不同地形上的最先进基线都优于最先进的基线。在进一步的现实验证中,Tert成功地穿越了九个具有挑战性的地形,包括沙坑和楼梯,这是无法通过强大的基线来完成的。
Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.