向后的课程增强学习

论文标题

向后的课程增强学习

Backward Curriculum Reinforcement Learning

论文作者

Ko, KyungMin

论文摘要

当前的强化学习算法使用前向生成的轨迹训练代理，该轨迹几乎没有指导，因此代理可以探索尽可能多的指导。在实现强化学习的价值的同时，从足够的探索中产生了这种方法，但这种方法在失去样本效率方面进行了权衡，这是影响算法性能的重要因素。以前的任务使用奖励成形技术和网络结构修改以提高样本效率。但是，这些方法需要许多步骤才能实施。在这项工作中，我们提出了新颖的向后课程增强学习，该学习开始使用情节的后退轨迹开始训练代理，而不是原始的前向轨迹。这种方法为代理提供了强烈的奖励信号，从而实现了更有效的样本学习。此外，我们的方法仅需要在代理训练之前逆转轨迹顺序的算法进行较小的更改，从而可以直接应用于任何最新的算法。

Current reinforcement learning algorithms train an agent using forward-generated trajectories, which provide little guidance so that the agent can explore as much as possible. While realizing the value of reinforcement learning results from sufficient exploration, this approach leads to a trade-off in losing sample efficiency, an essential factor impacting algorithm performance. Previous tasks use reward-shaping techniques and network structure modification to increase sample efficiency. However, these methods require many steps to implement. In this work, we propose novel backward curriculum reinforcement learning that begins training the agent using the backward trajectory of the episode instead of the original forward trajectory. This approach provides the agent with a strong reward signal, enabling more sample-efficient learning. Moreover, our method only requires a minor change in the algorithm of reversing the order of the trajectory before agent training, allowing a straightforward application to any state-of-the-art algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题