蒙特卡洛图搜索alphazero

论文标题

蒙特卡洛图搜索alphazero

Monte-Carlo Graph Search for AlphaZero

论文作者

Czech, Johannes, Korus, Patrick, Kersting, Kristian

论文摘要

Alphazero算法已成功应用于一系列离散域，最著名的是棋盘游戏。它利用神经网络，学习价值和策略功能来指导蒙特卡洛树搜索中的探索。尽管过去已经提出了许多用于蒙特卡洛树搜索的搜索改进，但其中大多数指的是树木算法的上限置信度范围的较旧变体，该变体不使用策略进行计划。我们为Alphazero介绍了一种新的改进的搜索算法，该算法将搜索树概括为有向的无环图。这使信息能够跨不同子树的流动，并大大降低了内存消耗。除了蒙特 - 卡洛图搜索外，我们提出了许多进一步的扩展，例如包括Epsilon-Greedy Exploration，修订后的终端求解器以及域知识作为约束的整合。在我们的评估中，我们使用国际象棋和Crazyhouse上的Crazyara引擎来表明这些变化为Alphazero带来了重大改进。

The AlphaZero algorithm has been successfully applied in a range of discrete domains, most notably board games. It utilizes a neural network, that learns a value and policy function to guide the exploration in a Monte-Carlo Tree Search. Although many search improvements have been proposed for Monte-Carlo Tree Search in the past, most of them refer to an older variant of the Upper Confidence bounds for Trees algorithm that does not use a policy for planning. We introduce a new, improved search algorithm for AlphaZero which generalizes the search tree to a directed acyclic graph. This enables information flow across different subtrees and greatly reduces memory consumption. Along with Monte-Carlo Graph Search, we propose a number of further extensions, such as the inclusion of Epsilon-greedy exploration, a revised terminal solver and the integration of domain knowledge as constraints. In our evaluations, we use the CrazyAra engine on chess and crazyhouse as examples to show that these changes bring significant improvements to AlphaZero.

下载PDF全文

下载文献需遵守相关版权规定

论文标题