Bingham策略参数化3D旋转的加固学习

论文标题

Bingham策略参数化3D旋转的加固学习

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

论文作者

James, Stephen, Abbeel, Pieter

论文摘要

我们提出了一个新的策略参数化，用于表示强化学习过程中的3D旋转。如今，在持续的控制加强学习文献中，许多随机策略参数是高斯。我们认为，对于所有环境，普遍应用高斯策略参数化并不总是需要。一个这样的例子是，这是一个正确的任务，涉及预测3D旋转输出，无论是孤立的，还是与翻译相结合，作为完整6D姿势输出的一部分。我们提出的宾厄姆政策参数化（BPP）对宾厄姆分布进行了建模，并在一系列强化学习任务中对高斯策略参数化进行更好的旋转（四元）预测。我们评估了BPP关于旋转WAHBA问题任务的BPP，以及来自RLBench的一组基于视觉的次要姿势机器人操纵任务。我们希望本文鼓励更多的研究开发其他政策参数化，这些参数化更适合特定环境，而不是总是假设高斯。

We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that this paper encourages more research into developing other policy parameterization that are more suited for particular environments, rather than always assuming Gaussian.

下载PDF全文

下载文献需遵守相关版权规定

论文标题