离线两人零和马尔可夫游戏什么时候可以解决？

论文标题

离线两人零和马尔可夫游戏什么时候可以解决？

When is Offline Two-Player Zero-Sum Markov Game Solvable?

论文作者

Cui, Qiwen, Du, Simon S.

论文摘要

我们研究哪些数据集假设允许解决离线两人零和马尔可夫游戏。与离线单机构马尔可夫决策过程形成鲜明对比的是，我们表明单个策略集中假设不足以在离线两人零零 - 马尔可夫游戏中学习NASH平衡（NE）策略。另一方面，我们提出了一个名为单方面浓度的新假设，并设计了一种悲观型算法，该算法可在此假设下有效。此外，我们表明单方面浓度假设对于学习NE策略是必需的。此外，我们的算法可以实现Minimax样本复杂性，而无需对两个广泛研究的设置进行任何修改：具有统一浓度假设和基于转弯的Markov游戏的数据集。我们的工作是了解离线多代理增强学习的重要第一步。

We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题