论文标题
马尔可夫决策过程的政策迭代的平滑复杂性
The Smoothed Complexity of Policy Iteration for Markov Decision Processes
论文作者
论文摘要
我们在Markov决策过程的经典霍华德策略迭代算法的平滑复杂性上显示了次指定下限(即$ 2^{ω(N^c)} $)。总奖励和平均奖励标准的界限。从某种意义上说,构造是强大的,因为次指数结合不仅在MDP参数(过渡概率和奖励)的独立随机扰动中保持平均值,而且对于在多发范围内的所有任意扰动中都保持。我们还显示了对简单可及性目标的最坏情况复杂性的指数下限。
We show subexponential lower bounds (i.e., $2^{Ω(n^c)}$) on the smoothed complexity of the classical Howard's Policy Iteration algorithm for Markov Decision Processes. The bounds hold for the total reward and the average reward criteria. The constructions are robust in the sense that the subexponential bound holds not only on the average for independent random perturbations of the MDP parameters (transition probabilities and rewards), but for all arbitrary perturbations within an inverse polynomial range. We show also an exponential lower bound on the worst-case complexity for the simple reachability objective.