认知蒙特卡洛树搜索

论文标题

认知蒙特卡洛树搜索

Epistemic Monte Carlo Tree Search

论文作者

Oren, Yaniv, Vadocz, Villiam, Spaan, Matthijs T. J., Böhmer, Wendelin

论文摘要

Alphazero/Muzero（A/MZ）算法家族通过将蒙特卡洛树搜索（MCT）与学习的模型整合在一起，在各种具有挑战性的领域取得了巨大的成功。学识渊博的模型引入了认知不确定性，这是由于从有限的数据中学习而引起的，对于稀疏奖励环境中的探索很有用。但是，MCT并未解释这种不确定性的传播。为了解决这个问题，我们介绍了认知的MCT（EMCT）：一种理论上动机的方法，以说明搜索和利用搜索搜索的认知不确定性。在用汇编语言{\ sc subleq}编写代码的具有挑战性的稀疏奖励任务中，与我们的方法配对的AZ相比，与基线AZ相比，样本效率明显更高。使用EMCT搜索解决了常用的硬探索基准深海的变化 - 实际上基线A/MZ无法解决 - 比不使用搜索不确定性估计的其他等效方法要快得多，这表明搜索对认知性不确定性估计的搜索都有很大的好处。

The AlphaZero/MuZero (A/MZ) family of algorithms has achieved remarkable success across various challenging domains by integrating Monte Carlo Tree Search (MCTS) with learned models. Learned models introduce epistemic uncertainty, which is caused by learning from limited data and is useful for exploration in sparse reward environments. MCTS does not account for the propagation of this uncertainty however. To address this, we introduce Epistemic MCTS (EMCTS): a theoretically motivated approach to account for the epistemic uncertainty in search and harness the search for deep exploration. In the challenging sparse-reward task of writing code in the Assembly language {\sc subleq}, AZ paired with our method achieves significantly higher sample efficiency over baseline AZ. Search with EMCTS solves variations of the commonly used hard-exploration benchmark Deep Sea - which baseline A/MZ are practically unable to solve - much faster than an otherwise equivalent method that does not use search for uncertainty estimation, demonstrating significant benefits from search for epistemic uncertainty estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题