马尔可夫决策过程的政策迭代的平滑复杂性

论文标题

马尔可夫决策过程的政策迭代的平滑复杂性

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

论文作者

Christ, Miranda, Yannakakis, Mihalis

论文摘要

我们在Markov决策过程的经典霍华德策略迭代算法的平滑复杂性上显示了次指定下限（即$ 2^{ω（N^c）} $）。总奖励和平均奖励标准的界限。从某种意义上说，构造是强大的，因为次指数结合不仅在MDP参数（过渡概率和奖励）的独立随机扰动中保持平均值，而且对于在多发范围内的所有任意扰动中都保持。我们还显示了对简单可及性目标的最坏情况复杂性的指数下限。

We show subexponential lower bounds (i.e., $2^{Ω(n^c)}$) on the smoothed complexity of the classical Howard's Policy Iteration algorithm for Markov Decision Processes. The bounds hold for the total reward and the average reward criteria. The constructions are robust in the sense that the subexponential bound holds not only on the average for independent random perturbations of the MDP parameters (transition probabilities and rewards), but for all arbitrary perturbations within an inverse polynomial range. We show also an exponential lower bound on the worst-case complexity for the simple reachability objective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题