连续过渡：通过混合提高连续控制问题的样品效率

论文标题

连续过渡：通过混合提高连续控制问题的样品效率

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

论文作者

Lin, Junfan, Huang, Zhongzhan, Wang, Keze, Liang, Xiaodan, Chen, Weiwei, Lin, Liang

论文摘要

尽管深入的强化学习（RL）已成功地应用于各种机器人控制任务，但由于样本效率较差，将其应用于现实世界任务仍然很具有挑战性。试图克服这一缺点，几项工作着重于在培训期间通过将它们分解为一组政策 - 急速离散过渡，重复使用收集的轨迹数据。但是，由于i）过渡的量通常很小，而ii）价值分配仅发生在联合状态。为了解决这些问题，本文介绍了一种构建连续过渡的简洁而强大的方法，该方法通过利用沿轨迹的潜在过渡来利用轨迹信息。具体而言，我们建议通过线性插值连续过渡来综合新的过渡以进行训练。为了保持构造的过渡真实，我们还开发了一个歧视器来自动指导施工过程。广泛的实验表明，我们提出的方法在穆约科克（Mujoco）的各种复杂连续机器人控制问题上取得了显着提高，并且胜过基于高级模型 /模型的无RL方法。源代码可用。

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policy-irrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the value assignment only happens in the joint states. To address these issues, this paper introduces a concise yet powerful method to construct Continuous Transition, which exploits the trajectory information by exploiting the potential transitions along the trajectory. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically. Extensive experiments demonstrate that our proposed method achieves a significant improvement in sample efficiency on various complex continuous robotic control problems in MuJoCo and outperforms the advanced model-based / model-free RL methods. The source code is available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题