用于机器人增强学习的平稳探索

论文标题

用于机器人增强学习的平稳探索

Smooth Exploration for Robotic Reinforcement Learning

论文作者

Raffin, Antonin, Kober, Jens, Stulp, Freek

论文摘要

强化学习（RL）使机器人能够从与现实世界的互动中学习技能。在实践中，在Deep RL中使用的非结构化基于步骤的探索（通常在模拟中非常成功）会导致真实机器人的生动运动模式。由此产生的摇晃行为的后果是探索不佳，甚至是对机器人的损害。我们通过将依赖性探索（SDE）调整为当前深度RL算法来解决这些问题。为了实现这种适应，我们使用更通用的特征并定期重新采样了原始SDE的两个扩展，这导致了一种新的探索方法概括性依赖性探索（GSDE）。我们在模拟，Pybullet连续控制任务以及直接在三个不同的实际机器人上评估GSDE：肌腱驱动的弹性机器人，四足动物和RC汽车。 GSDE的噪声采样间隔允许在性能和平滑度之间妥协，这允许直接在真实机器人上进行训练，而不会失去性能。该代码可在https://github.com/dlr-rm/stable-baselines3上找到。

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.

下载PDF全文

下载文献需遵守相关版权规定

论文标题