马尔可夫决策过程及其与风险措施的联系

论文标题

马尔可夫决策过程及其与风险措施的联系

Distributionally Robust Markov Decision Processes and their Connection to Risk Measures

论文作者

Bäuerle, Nicole, Glauner, Alexander

论文摘要

我们认为马尔可夫的决策过程具有强大的决策过程，并具有无限的成本和有限的时间范围。我们的表述导致了对抗自然的Stackelberg游戏。在整合性，连续性和紧凑性假设下，我们为决策者的固定策略提供了可靠的成本迭代，并为可靠的优化问题提供了价值迭代。此外，我们展示了两个参与者的确定性最佳政策的存在。这与古典零和游戏相反。如果状态空间是我们在某些凸度的假设下显示的实际线路，则可以在Sion的Minimax定理的帮助下进行互换和immimum的互换。此外，我们考虑了特殊歧义集的问题。特别是我们能够得出某些情况，使强大的优化问题与一致风险度量的最小化相吻合。在最后一部分中，我们讨论了两个应用：强大的LQ问题和用于管理再生能量的强大问题。

We consider robust Markov Decision Processes with Borel state and action spaces, unbounded cost and finite time horizon. Our formulation leads to a Stackelberg game against nature. Under integrability, continuity and compactness assumptions we derive a robust cost iteration for a fixed policy of the decision maker and a value iteration for the robust optimization problem. Moreover, we show the existence of deterministic optimal policies for both players. This is in contrast to classical zero-sum games. In case the state space is the real line we show under some convexity assumptions that the interchange of supremum and infimum is possible with the help of Sion's minimax Theorem. Further, we consider the problem with special ambiguity sets. In particular we are able to derive some cases where the robust optimization problem coincides with the minimization of a coherent risk measure. In the final section we discuss two applications: A robust LQ problem and a robust problem for managing regenerative energy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题