结合深度强化学习并寻找不完善的信息游戏

论文标题

结合深度强化学习并寻找不完善的信息游戏

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

论文作者

Brown, Noam, Bakhtin, Anton, Lerer, Adam, Gong, Qucheng

论文摘要

在训练时间和测试时间上，深入的强化学习和搜索的结合是一个强大的范式，在单一赋予的设置和完美信息游戏中取得了许多成功，最好由Alphazero说明。但是，此表格的先前算法无法应对不完美的信息游戏。本文介绍了Rebel，这是一个自我播放增强学习和搜索的一般框架，可证明在任何两名玩家零和游戏中都融合了NASH平衡。在更简单的完美信息游戏的设置中，叛逆者还将算法减少到类似于Alphazero的算法。导致两个不同的不完美信息游戏表明叛军收敛到近似的NASH平衡。我们还展示了Rebel在No-Limit Texas Hold'em Poker中取得了超人的表现，同时使用的域知识要比任何以前的扑克AI少得多。

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题