论文标题

在局部过渡下的约束MDP和随机最短路径的完全多项式时间近似方案

A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions

论文作者

Khonji, Majid

论文摘要

固定性马匹限制的马尔可夫决策过程(C-MDP)是在操作约束下在随机环境中规划的众所周知的模型。偶然受限的MDP(CC-MDP)是一种允许限制违规概率的变体,这在许多安全至关重要的应用中是必需的。 CC-MDP还可以在死端下建模一类称为随机最短路径(SSP)的MDP,其中在目标与目标和目标成本之间存在权衡。这项工作研究了(c)C-MDP的结构,尤其是涉及局部过渡的重要变体。在这种变体中,状态达到一定程度的位置和与其余状态的独立性。更确切地说,在给定时间,共享一些可及未来状态的状态始终是恒定的。 (c)在本地过渡下的C-MDP即使对于两个规划范围,也是NP的HARD。在这项工作中,我们为(c)C-MDP提出了一个完全多项式的近似方案,该方案计算(接近)最佳的确定性策略。这种算法是可以在理论上达到的最佳近似算法之一,并且可以深入了解约束MDP及其变体的近似性。

The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of constraint violation, which is desired in many safety-critical applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest Path (SSP), under dead-ends, where there is a trade-off between the probability-to-goal and cost-to-goal. This work studies the structure of (C)C-MDP, particularly an important variant that involves local transition. In this variant, the state reachability exhibits a certain degree of locality and independence from the remaining states. More precisely, the number of states, at a given time, that share some reachable future states is always constant. (C)C-MDP under local transition is NP-Hard even for a planning horizon of two. In this work, we propose a fully polynomial-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm is among the best approximation algorithm attainable in theory and gives insights into the approximability of constrained MDP and its variants.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源