论文标题

砖头-TAC-toe:探索alphazero对新型测试环境的普遍性

Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test Environments

论文作者

Min, John Tan Chong, Motani, Mehul

论文摘要

传统的增强学习(RL)环境通常在培训和测试阶段都相同。因此,当前的RL方法在很大程度上不能推广到概念上相似但与已训练的方法不同的测试环境,我们将其称为新型测试环境。为了将RL研究推向可以推广到新型测试环境的算法,我们介绍了砖头TIC-TAC-TOE(BTTT)测试床,其中在测试环境中的砖位与训练环境中的砖位不同。使用BTTT环境上的圆形锦标赛,我们表明传统的RL国家搜索方法,例如Monte Carlo Tree Search(MCT)和Minimax,比Alphazero更广泛地对新型测试环境更具概括性。令人惊讶的是,Alphazero已被证明可以在GO,Chess和Shogi等环境中实现超人的性能,这可能会导致人们认为它在新颖的测试环境中的性能很好。我们的结果表明,BTTT虽然很简单,但足够丰富,可以探索Alphazero的普遍性。我们发现,仅增加MCT lookahead迭代是不足以使Alphazero推广到一些新型的测试环境。相反,增加各种培训环境有助于逐步改善所有可能的起始砖配置中的普遍性。

Traditional reinforcement learning (RL) environments typically are the same for both the training and testing phases. Hence, current RL methods are largely not generalizable to a test environment which is conceptually similar but different from what the method has been trained on, which we term the novel test environment. As an effort to push RL research towards algorithms which can generalize to novel test environments, we introduce the Brick Tic-Tac-Toe (BTTT) test bed, where the brick position in the test environment is different from that in the training environment. Using a round-robin tournament on the BTTT environment, we show that traditional RL state-search approaches such as Monte Carlo Tree Search (MCTS) and Minimax are more generalizable to novel test environments than AlphaZero is. This is surprising because AlphaZero has been shown to achieve superhuman performance in environments such as Go, Chess and Shogi, which may lead one to think that it performs well in novel test environments. Our results show that BTTT, though simple, is rich enough to explore the generalizability of AlphaZero. We find that merely increasing MCTS lookahead iterations was insufficient for AlphaZero to generalize to some novel test environments. Rather, increasing the variety of training environments helps to progressively improve generalizability across all possible starting brick configurations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源