变压器中的变压器作为深度增强学习的骨干

论文标题

变压器中的变压器作为深度增强学习的骨干

Transformer in Transformer as Backbone for Deep Reinforcement Learning

论文作者

Mao, Hangyu, Zhao, Rui, Chen, Hao, Hao, Jianye, Chen, Yiqun, Li, Dong, Zhang, Junge, Xiao, Zhen

论文摘要

设计更好的深层网络和更好的增强学习（RL）算法对于Deep RL都很重要。这项工作着重于前者。以前的方法通过CNN，LSTM和注意力等多个模块构建网络。最近的方法将变压器与这些模块结合在一起，以提高性能。但是，它需要乏味的优化技巧来训练由混合模块组成的网络，从而使这些方法不便在实践中使用。在本文中，我们建议设计\ emph {pure transformer网络}以进行Deep RL，旨在为在线和离线设置提供现成的骨干。具体而言，提出了变压器（TIT）主链中的变压器，它以非常自然的方式层叠了两个变压器：内部的变压器用于处理单个观察结果，而外部则负责处理观察史；预计两者都将提取时空表示以进行良好的决策。实验表明，TIT可以始终如一地在不同的设置中实现令人满意的性能。

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings consistently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题