长期平均值和总奖励的多目标优化

论文标题

长期平均值和总奖励的多目标优化

Multi-objective Optimization of Long-run Average and Total Rewards

论文作者

Quatmann, Tim, Katoen, Joost-Pieter

论文摘要

本文为长期平均奖励（aka：平均报酬）和总奖励目标及其组合提供了多目标模型检查的有效程序。我们认为这是Markov Automata，这是一个组成模型，它捕获了传统的马尔可夫决策过程（MDP）以及其连续的时间变体。我们程序的症结在于对MARCOMA自动机对MDP的总奖励的总奖励的概括。在风暴模型检查器之上进行了典型实现的实验表明，这两种模型类型都令人鼓舞，这表明基于线性编程的现有多目标长期MDP模型检查的性能得到了实质性改进的性能。

This paper presents an efficient procedure for multi-objective model checking of long-run average reward (aka: mean pay-off) and total reward objectives as well as their combination. We consider this for Markov automata, a compositional model that captures both traditional Markov decision processes (MDPs) as well as a continuous-time variant thereof. The crux of our procedure is a generalization of Forejt et al.'s approach for total rewards on MDPs to arbitrary combinations of long-run and total reward objectives on Markov automata. Experiments with a prototypical implementation on top of the Storm model checker show encouraging results for both model types and indicate a substantial improved performance over existing multi-objective long-run MDP model checking based on linear programming.

下载PDF全文

下载文献需遵守相关版权规定

论文标题