论文标题
用于机器人增强学习的平稳探索
Smooth Exploration for Robotic Reinforcement Learning
论文作者
论文摘要
强化学习(RL)使机器人能够从与现实世界的互动中学习技能。在实践中,在Deep RL中使用的非结构化基于步骤的探索(通常在模拟中非常成功)会导致真实机器人的生动运动模式。由此产生的摇晃行为的后果是探索不佳,甚至是对机器人的损害。我们通过将依赖性探索(SDE)调整为当前深度RL算法来解决这些问题。为了实现这种适应,我们使用更通用的特征并定期重新采样了原始SDE的两个扩展,这导致了一种新的探索方法概括性依赖性探索(GSDE)。我们在模拟,Pybullet连续控制任务以及直接在三个不同的实际机器人上评估GSDE:肌腱驱动的弹性机器人,四足动物和RC汽车。 GSDE的噪声采样间隔允许在性能和平滑度之间妥协,这允许直接在真实机器人上进行训练,而不会失去性能。该代码可在https://github.com/dlr-rm/stable-baselines3上找到。
Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.