论文标题
确定什么是好事
Deciding What is Good-for-MDPs
论文作者
论文摘要
非确定性的好处MDP(GFM)自动机是用于MDP模型检查和强化,学习哪种良好游戏自动机是用于反应性合成的:一种更紧凑的确定性自动机的替代方案,它显示出无效的自动机,但可以在本地解决如此之多,以至于可以在本地进行解决,以至于可以进行综合产品。 GFM最近被引入了增强学习的属性,在此简单的Büchi接受条件下,它允许使用的是关键。但是,尽管有经典和新颖的技术可以获得GFM的自动机,但尚无决定检查自动机是否为GFM的决策程序。我们证明GFM-度是可以决定的,并提供了一个档案决策程序以及PSPACE硬度证明。
Nondeterministic Good-for-MDP (GFM) automata are for MDP model checking and reinforcement learning what good-for-games automata are for reactive synthesis: a more compact alternative to deterministic automata that displays nondeterminism, but only so much that it can be resolved locally, such that a syntactic product can be analysed. GFM has recently been introduced as a property for reinforcement learning, where the simpler Büchi acceptance conditions it allows to use is key. However, while there are classic and novel techniques to obtain automata that are GFM, there has not been a decision procedure for checking whether or not an automaton is GFM. We show that GFM-ness is decidable and provide an EXPTIME decision procedure as well as a PSPACE-hardness proof.