论文标题
结合深度强化学习并寻找不完善的信息游戏
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
论文作者
论文摘要
在训练时间和测试时间上,深入的强化学习和搜索的结合是一个强大的范式,在单一赋予的设置和完美信息游戏中取得了许多成功,最好由Alphazero说明。但是,此表格的先前算法无法应对不完美的信息游戏。本文介绍了Rebel,这是一个自我播放增强学习和搜索的一般框架,可证明在任何两名玩家零和游戏中都融合了NASH平衡。在更简单的完美信息游戏的设置中,叛逆者还将算法减少到类似于Alphazero的算法。导致两个不同的不完美信息游戏表明叛军收敛到近似的NASH平衡。我们还展示了Rebel在No-Limit Texas Hold'em Poker中取得了超人的表现,同时使用的域知识要比任何以前的扑克AI少得多。
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.