通过简化的模型模仿来学习训练

论文标题

通过简化的模型模仿来学习训练

Learning to Brachiate via Simplified Model Imitation

论文作者

Reda, Daniele, Ling, Hung Yu, van de Panne, Michiel

论文摘要

括号是长臂猿和暹罗的运动的主要形式，其中这些灵长类动物仅使用它们的手臂从树肢到树四肢旋转。由于控制权有限，所需的预先计划以及所需的掌握精度，控制权是具有挑战性的。我们使用强化学习提出了一种解决这个问题的新方法，并在无名的14链平面模型上证明，该模型学会了跨越具有挑战性的手工序列的掌握。我们方法的关键是使用简化的模型，一个具有虚拟臂的点质量，我们首先要学习一个可以用规定的顺序跨性序列进行跨性别序列的策略。这有助于学习完整模型的政策，并通过提供一个整体质量轨迹来模仿以及持有的时间来提供指导。最后，简化的模型还可以轻松地用于在给定环境中计划合适的手持序列。我们的结果表明，飞行的持续时间有多种持续时间，并保持阶段，并在证明这很有用时，额外的额外来回摆动。该系统通过各种消融进行评估。该方法使未来的工作能够实现更一般的3D括号，并在其他设置中使用简化的模型模仿。

Brachiation is the primary form of locomotion for gibbons and siamangs, in which these primates swing from tree limb to tree limb using only their arms. It is challenging to control because of the limited control authority, the required advance planning, and the precision of the required grasps. We present a novel approach to this problem using reinforcement learning, and as demonstrated on a finger-less 14-link planar model that learns to brachiate across challenging handhold sequences. Key to our method is the use of a simplified model, a point mass with a virtual arm, for which we first learn a policy that can brachiate across handhold sequences with a prescribed order. This facilitates the learning of the policy for the full model, for which it provides guidance by providing an overall center-of-mass trajectory to imitate, as well as for the timing of the holds. Lastly, the simplified model can also readily be used for planning suitable sequences of handholds in a given environment. Our results demonstrate brachiation motions with a variety of durations for the flight and hold phases, as well as emergent extra back-and-forth swings when this proves useful. The system is evaluated with a variety of ablations. The method enables future work towards more general 3D brachiation, as well as using simplified model imitation in other settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题