论文标题
情节增强学习的稳态分析
Steady State Analysis of Episodic Reinforcement Learning
论文作者
论文摘要
本文证明,每个有限的 - 摩尼克决策任务的情节学习环境在任何行为政策下都具有独特的稳态,并且代理商输入的边际分布确实在所有情节学习过程中都收敛于稳态分布。这一观察结果支持了一种有趣的反转思维方式,反对传统观点:虽然在不断学习中经常假定独特的稳态存在,但在情节学习中被认为较少相关,但事实证明,它们的存在是后者的。基于这个见解,本文围绕了这两个RL形式主义中分别处理的几个重要概念,将情节性和连续的RL统一。实际上,存在独特且平易近人的稳态的存在,可以基于新的稳态策略梯度定理来收集情节RL任务中数据的一般方法,将其应用于策略梯度算法作为演示。最后,本文还提出并通过实验验证了一种扰动方法,该方法促进了现实世界中RL任务中快速稳态收敛。
This paper proves that the episodic learning environment of every finite-horizon decision task has a unique steady state under any behavior policy, and that the marginal distribution of the agent's input indeed converges to the steady-state distribution in essentially all episodic learning processes. This observation supports an interestingly reversed mindset against conventional wisdom: While the existence of unique steady states was often presumed in continual learning but considered less relevant in episodic learning, it turns out their existence is guaranteed for the latter. Based on this insight, the paper unifies episodic and continual RL around several important concepts that have been separately treated in these two RL formalisms. Practically, the existence of unique and approachable steady state enables a general way to collect data in episodic RL tasks, which the paper applies to policy gradient algorithms as a demonstration, based on a new steady-state policy gradient theorem. Finally, the paper also proposes and experimentally validates a perturbation method that facilitates rapid steady-state convergence in real-world RL tasks.