论文标题
长期平均值和总奖励的多目标优化
Multi-objective Optimization of Long-run Average and Total Rewards
论文作者
论文摘要
本文为长期平均奖励(aka:平均报酬)和总奖励目标及其组合提供了多目标模型检查的有效程序。我们认为这是Markov Automata,这是一个组成模型,它捕获了传统的马尔可夫决策过程(MDP)以及其连续的时间变体。我们程序的症结在于对MARCOMA自动机对MDP的总奖励的总奖励的概括。在风暴模型检查器之上进行了典型实现的实验表明,这两种模型类型都令人鼓舞,这表明基于线性编程的现有多目标长期MDP模型检查的性能得到了实质性改进的性能。
This paper presents an efficient procedure for multi-objective model checking of long-run average reward (aka: mean pay-off) and total reward objectives as well as their combination. We consider this for Markov automata, a compositional model that captures both traditional Markov decision processes (MDPs) as well as a continuous-time variant thereof. The crux of our procedure is a generalization of Forejt et al.'s approach for total rewards on MDPs to arbitrary combinations of long-run and total reward objectives on Markov automata. Experiments with a prototypical implementation on top of the Storm model checker show encouraging results for both model types and indicate a substantial improved performance over existing multi-objective long-run MDP model checking based on linear programming.