通过双重观察进行加强学习，用于一般视频游戏

论文标题

通过双重观察进行加强学习，用于一般视频游戏

Reinforcement Learning with Dual-Observation for General Video Game Playing

论文作者

Hu, Chengpeng, Wang, Ziqi, Shu, Tianye, Tong, Hao, Togelius, Julian, Yao, Xin, Liu, Jialin

论文摘要

强化学习算法在玩具有挑战性的董事会和视频游戏方面表现出色。越来越多的研究集中于提高增强学习算法的概括能力。一般的视频游戏AI学习竞赛旨在开发能够学习在训练过程中看不见的游戏水平的代理商。本文总结了五年的通用视频游戏AI学习比赛版本。在每个版本中，都设计了三款新游戏。培训和测试水平是在前三个版本中分别设计的。自2020年以来，每场游戏的三个测试水平是通过干扰或结合两个训练水平来产生的。然后，我们提出了一种新颖的增强学习技术，并具有双重观点，用于一般视频游戏，假设它更有可能在不同级别而不是全球信息中观察到类似的本地信息。我们提出的一般技术没有直接输入当前游戏屏幕的单个基于原始像素的屏幕截图，而是将游戏屏幕的编码，转换的全球和本地观察结果作为两个同时输入，旨在学习局部信息以播放新级别。我们提出的技术通过三种最先进的增强学习算法实施，并在2020年通用视频游戏AI学习竞赛的游戏集中进行了测试。消融研究表明，使用编码的，转换的全球和局部观测作为输入的出色表现。

Reinforcement learning algorithms have performed well in playing challenging board and video games. More and more studies focus on improving the generalisation ability of reinforcement learning algorithms. The General Video Game AI Learning Competition aims to develop agents capable of learning to play different game levels that were unseen during training. This paper summarises the five years' General Video Game AI Learning Competition editions. At each edition, three new games were designed. The training and test levels were designed separately in the first three editions. Since 2020, three test levels of each game were generated by perturbing or combining two training levels. Then, we present a novel reinforcement learning technique with dual-observation for general video game playing, assuming that it is more likely to observe similar local information in different levels rather than global information. Instead of directly inputting a single, raw pixel-based screenshot of the current game screen, our proposed general technique takes the encoded, transformed global and local observations of the game screen as two simultaneous inputs, aiming at learning local information for playing new levels. Our proposed technique is implemented with three state-of-the-art reinforcement learning algorithms and tested on the game set of the 2020 General Video Game AI Learning Competition. Ablation studies show the outstanding performance of using encoded, transformed global and local observations as input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题