基于延迟感知的基于模型的强化学习进行连续控制

论文标题

基于延迟感知的基于模型的强化学习进行连续控制

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

论文作者

Chen, Baiming, Xu, Mengdi, Li, Liang, Zhao, Ding

论文摘要

动作延迟在许多现实世界系统中降低增强学习的性能。本文提出了对马尔可夫决策过程的正式定义，并证明它可以使用马尔可夫奖励过程将其转化为标准MDP。我们开发了一个基于延迟感知的模型的增强学习框架，可以将多步延迟纳入学习的系统模型中，而无需学习。与健身房和穆乔科平台进行的实验表明，与无政策的无模型强化学习方法相比，拟议的基于延迟感知模型的算法在训练中更有效，并且在具有各种延迟延迟的系统之间可以转移。可用的代码，网址为：https：//github.com/baimingc/dambrl。

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: https://github.com/baimingc/dambrl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题