在高效和有效学习之间的平衡：密集2Sparse奖励机器人操纵与环境不确定性

论文标题

在高效和有效学习之间的平衡：密集2Sparse奖励机器人操纵与环境不确定性

Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

论文作者

Luo, Yongle, Dong, Kun, Zhao, Lili, Sun, Zhiyong, Zhou, Chao, Song, Bo

论文摘要

有效而有效的学习是深度强化学习（DRL）的最终目标之一，尽管大部分时间都做出了妥协，尤其是用于应用机器人操纵。对于机器人操纵任务而言，学习总是很昂贵，并且学习效率可能会受到系统不确定性的影响。为了解决上述挑战，在这项研究中，我们提出了一种简单但有力的奖励塑造方法，即密集2sparse。它结合了密集奖励快速收敛的优势和稀疏奖励的噪声隔离，以在学习效率和有效性之间达到平衡，这使其适合于机器人操纵任务。我们使用具有系统不确定性的状态表示模型进行了一系列消融实验，通过一系列消融实验评估了我们的致密2SPARSE方法。实验结果表明，与使用独立的密集奖励或稀疏奖励相比，密集的2sparse方法获得了更高的预期奖励，并且对系统不确定性也具有较高的耐受性。

Efficient and effective learning is one of the ultimate goals of the deep reinforcement learning (DRL), although the compromise has been made in most of the time, especially for the application of robot manipulations. Learning is always expensive for robot manipulation tasks and the learning effectiveness could be affected by the system uncertainty. In order to solve above challenges, in this study, we proposed a simple but powerful reward shaping method, namely Dense2Sparse. It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness, which makes it suitable for robot manipulation tasks. We evaluated our Dense2Sparse method with a series of ablation experiments using the state representation model with system uncertainty. The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.

下载PDF全文

下载文献需遵守相关版权规定

论文标题