论文标题

差异含量的随机路径问题

The variance-penalized stochastic shortest path problem

论文作者

Piribauer, Jakob, Sankur, Ocan, Baier, Christel

论文摘要

随机最短路径问题(SSPP)要求在马尔可夫决策过程(MDP)中解决非确定性选择,以便最大化预期的累积权重。本文介绍了累积权重的方差 - 占期望值(VPE)的优化,这是SSPP的变体,其中累积权重的倍数被视为惩罚。结果表明,可以在指数空间中计算具有非负权重的MDP中的最佳VPE以及具有最佳确定性有限记忆调度程序。最大VPE是否超过给定有理理性的阈值问题被证明是持续时间且位于nexptime中。此外,在途中获得的权利的结果是,可以在多项式时间内计算所有期望 - 最佳调度程序之间的差异最小调度程序。

The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative weights as well as an optimal deterministic finite-memory scheduler can be computed in exponential space. The threshold problem whether the maximal VPE exceeds a given rational is shown to be EXPTIME-hard and to lie in NEXPTIME. Furthermore, a result of interest in its own right obtained on the way is that a variance-minimal scheduler among all expectation-optimal schedulers can be computed in polynomial time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源