在层次强化学习中的信用分配

论文标题

在层次强化学习中的信用分配

On Credit Assignment in Hierarchical Reinforcement Learning

论文作者

de Vries, Joery A., Moerland, Thomas M., Plaat, Aske

论文摘要

等级强化学习（HRL）拥有长期的前途，可以推进强化学习。然而，开发表现出其中一些承诺的实用算法的挑战仍然是一个巨大的挑战。为了提高我们对HRL的基本理解，我们从传统的多步加强学习的角度研究了等级的信用分配。我们展示了如何将1步``层次备份''视为传统的多步骤备份，随着时间的流逝，$ n $ skip Connections将随后的每个状态连接到第一个状态与第一个状态连接到第一个独立于中间的动作。此外，我们发现将层次结构概括到多步返回估计方法需要我们考虑如何划分环境跟踪，以构建备份路径。我们利用这些见解来开发新的层次结构算法Hier $ q_k（λ）$，为此我们证明，仅层次信用分配已经可以提高代理的性能（即在消除概括或探索时）。总之，我们的工作产生了对层次备份性质的基本见解，并将其作为增强学习研究的附加基础。

Hierarchical Reinforcement Learning (HRL) has held longstanding promise to advance reinforcement learning. Yet, it has remained a considerable challenge to develop practical algorithms that exhibit some of these promises. To improve our fundamental understanding of HRL, we investigate hierarchical credit assignment from the perspective of conventional multistep reinforcement learning. We show how e.g., a 1-step `hierarchical backup' can be seen as a conventional multistep backup with $n$ skip connections over time connecting each subsequent state to the first independent of actions inbetween. Furthermore, we find that generalizing hierarchy to multistep return estimation methods requires us to consider how to partition the environment trace, in order to construct backup paths. We leverage these insight to develop a new hierarchical algorithm Hier$Q_k(λ)$, for which we demonstrate that hierarchical credit assignment alone can already boost agent performance (i.e., when eliminating generalization or exploration). Altogether, our work yields fundamental insight into the nature of hierarchical backups and distinguishes this as an additional basis for reinforcement learning research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题