论文标题
扮演奥赛罗的单词播放(逆转)
Word Play for Playing Othello (Reverses)
论文作者
论文摘要
诸如OpenAI的生成预训练的变压器(GPT-2/3)之类的语言模型捕获了在各种域(例如语言翻译人员)和最近在游戏玩法(国际象棋,GO和Checkers)中生成文本所需的长期相关性。本研究同时应用了较大的(GPT-3)和较小的(GPT-2)语言模型来探索奥赛罗(或逆转)游戏的复杂策略。鉴于《财富快速逆转》的游戏规则,语言模型不仅代表了基于以前的游戏动作的下一步行动的候选预测指标,而且还避免了游戏玩法中的稀疏奖励。语言模型会自动捕获或模拟冠军级策略。微调的GPT-2型号产生的Othello游戏范围为13-71%,而较大的GPT-3型号则达到完整游戏的41%。像以前的棋子和Go一样,这些语言模型提供了一种新颖的方式来生成合理的游戏档案,尤其是用于比较比人类更大的样本的开放动作。这些模型的主要贡献(通过两倍)是玩家档案的先前记录(从1977 - 2022年开始的45年,有120,000次人类游戏),从而为研究界提供了使用其他强化学习技术进行抽样的更多样化和原始的策略。
Language models like OpenAI's Generative Pre-Trained Transformers (GPT-2/3) capture the long-term correlations needed to generate text in a variety of domains (such as language translators) and recently in gameplay (chess, Go, and checkers). The present research applies both the larger (GPT-3) and smaller (GPT-2) language models to explore the complex strategies for the game of Othello (or Reverses). Given the game rules for rapid reversals of fortune, the language model not only represents a candidate predictor of the next move based on previous game moves but also avoids sparse rewards in gameplay. The language model automatically captures or emulates championship-level strategies. The fine-tuned GPT-2 model generates Othello games ranging from 13-71% completion, while the larger GPT-3 model reaches 41% of a complete game. Like previous work with chess and Go, these language models offer a novel way to generate plausible game archives, particularly for comparing opening moves across a larger sample than humanly possible to explore. A primary contribution of these models magnifies (by two-fold) the previous record for player archives (120,000 human games over 45 years from 1977-2022), thus supplying the research community with more diverse and original strategies for sampling with other reinforcement learning techniques.