部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Disentangling Transfer in Continual Reinforcement Learning

论文作者

Wołczyk, Maciej, Zając, Michał, Pascanu, Razvan, Kuciński, Łukasz, Miłoś, Piotr

论文摘要

持续学习系统将知识从先前看到的任务转移以最大程度地提高新任务的绩效的能力是该领域的重大挑战，限制了持续学习解决方案对现实情况的适用性。因此，本研究旨在扩大我们在不断加强学习的特定情况下对转移及其驱动力的理解。我们采用SAC作为基础RL算法和连续的世界作为一系列连续的控制任务。我们系统地研究了SAC（演员和评论家，勘探和数据）的不同组成部分如何影响转移功效，并且我们提供有关各种建模选项的建议。在最近的连续世界基准中评估了最佳的选择，即称为clonex-sac。 Clonex-SAC获得了87％的最终成功率，而Packnet的80％是基准中最好的方法。此外，根据连续世界提供的指标，转移从0.18增至0.54。

The ability of continual learning systems to transfer knowledge from previously seen tasks in order to maximize performance on new tasks is a significant challenge for the field, limiting the applicability of continual learning solutions to realistic scenarios. Consequently, this study aims to broaden our understanding of transfer and its driving forces in the specific case of continual reinforcement learning. We adopt SAC as the underlying RL algorithm and Continual World as a suite of continuous control tasks. We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. The best set of choices, dubbed ClonEx-SAC, is evaluated on the recent Continual World benchmark. ClonEx-SAC achieves 87% final success rate compared to 80% of PackNet, the best method in the benchmark. Moreover, the transfer grows from 0.18 to 0.54 according to the metric provided by Continual World.

下载PDF全文

下载文献需遵守相关版权规定

论文标题