论文标题

基于延迟感知的基于模型的强化学习进行连续控制

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

论文作者

Chen, Baiming, Xu, Mengdi, Li, Liang, Zhao, Ding

论文摘要

动作延迟在许多现实世界系统中降低增强学习的性能。本文提出了对马尔可夫决策过程的正式定义,并证明它可以使用马尔可夫奖励过程将其转化为标准MDP。我们开发了一个基于延迟感知的模型的增强学习框架,可以将多步延迟纳入学习的系统模型中,而无需学习。与健身房和穆乔科平台进行的实验表明,与无政策的无模型强化学习方法相比,拟议的基于延迟感知模型的算法在训练中更有效,并且在具有各种延迟延迟的系统之间可以转移。可用的代码,网址为:https://github.com/baimingc/dambrl。

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: https://github.com/baimingc/dambrl.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源