关于通过变压器转换强化学习：发展轨迹

论文标题

关于通过变压器转换强化学习：发展轨迹

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

论文作者

Hu, Shengchao, Shen, Li, Zhang, Ya, Chen, Yixin, Tao, Dacheng

论文摘要

最初是为自然语言处理而设计的变压器也证明了计算机视觉的巨大成功。由于其超表现力，研究人员正在研究将变压器部署到强化学习（RL）的方法，而基于变压器的模型已经在代表性的RL基准测试中表现出了潜力。在本文中，我们收集和剖析了Transformer（基于变压器的RL或TRL）转换RL的最新进展，以探索其发展轨迹和未来趋势。我们将现有的开发分为两类：架构增强和轨迹优化，并检查TRL在机器人操作，基于文本的游戏，导航和自动驾驶中的主要应用。为了增强架构，这些方法考虑了如何将强大的变压器结构应用于传统RL框架下的RL问题，而RL框架比深度RL方法更精确地模型，但它们仍然受到传统RL算法的固有缺陷的限制，例如bootstapting和“ bootstapping and deadly tally Triad”。为了进行轨迹优化，这些方法将RL问题视为序列建模，并在行为克隆框架下整个轨迹上训练联合状态行动模型，这些模型能够从静态数据集中提取策略，并充分使用变压器的长期建模能力。鉴于这些进步，审查了TRL中的扩展和挑战，并讨论了有关未来方向的建议。我们希望这项调查可以为TRL提供详细的介绍，并激励在这个快速发展的领域中进行未来的研究。

Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题