论文标题
PAC加强学习预测状态表示
PAC Reinforcement Learning for Predictive State Representations
论文作者
论文摘要
在本文中,我们在部分可观察到的动态系统中研究在线增强学习(RL)。我们专注于预测状态表示(PSRS)模型,该模型是捕获其他知名模型(例如可观察到的马尔可夫决策过程(POMDP))的表达模型。 PSR使用一组未来观察结果的预测表示状态,并完全使用可观察的数量来定义。我们为PSR开发了一种新型的基于模型的算法,该算法可以在样本复杂性中以多项性地缩放系统的所有相关参数来学习几乎最佳的策略。我们的算法自然可以与功能近似合作,以扩展到具有较大状态和观察空间的系统。我们表明,给定一个可实现的模型类别,学习近乎最佳策略的样本复杂性仅相对于模型类的统计复杂性缩放,而没有任何明确的多项式依赖性对状态和观察空间的依赖。值得注意的是,我们的工作是第一项显示多项式样本复杂性与PSR中全球最佳政策竞争的工作。最后,我们证明了如何直接使用我们的一般定理来推导特殊模型的样品复杂性界限,包括$ m $ step弱揭示和$ m $ $ - 步骤可解码的表格POMDP,具有低率潜在过渡的POMDP,以及具有线性发射和潜在过渡的POMDP。
In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as Partially Observable Markov Decision Processes (POMDP). PSR represents the states using a set of predictions of future observations and is defined entirely using observable quantities. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scaling polynomially with respect to all the relevant parameters of the systems. Our algorithm naturally works with function approximation to extend to systems with potentially large state and observation spaces. We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces. Notably, our work is the first work that shows polynomial sample complexities to compete with the globally optimal policy in PSRs. Finally, we demonstrate how our general theorem can be directly used to derive sample complexity bounds for special models including $m$-step weakly revealing and $m$-step decodable tabular POMDPs, POMDPs with low-rank latent transition, and POMDPs with linear emission and latent transition.