连续国家马尔可夫决策过程的基于泰勒的内核taylor函数近似

论文标题

连续国家马尔可夫决策过程的基于泰勒的内核taylor函数近似

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

论文作者

Xu, Junhong, Yin, Kai, Liu, Lantao

论文摘要

我们提出了一种基于内核的政策迭代算法，以解决连续国家马尔可夫决策过程（MDPS）。与大多数假定已知的国家过渡模型的决策理论计划框架相反，我们设计了一种消除这种强大假设的方法，实际上，这通常很难设计。为了实现这一目标，我们首先应用了值函数的二阶Taylor扩展。然后，Bellman最佳方程是通过部分微分方程近似的，该方程仅依赖于过渡模型的第一矩和第二矩。通过结合价值函数的内核表示，我们设计了一种有效的政策迭代算法，其策略评估步骤可以表示为以有限的支持状态集为特征的方程式线性系统。我们已经通过简化和现实的计划场景中的大量模拟验证了所提出的方法，实验表明，我们提出的方法导致比几种基线方法的性能要出色。

We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we design a method that eliminates such a strong assumption, which is oftentimes extremely difficult to engineer in reality. To achieve this, we first apply the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of value function, we then design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios, and the experiments show that our proposed approach leads to a much superior performance over several baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题