VRL3：一个用于视觉深度强化学习的数据驱动框架

论文标题

VRL3：一个用于视觉深度强化学习的数据驱动框架

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

论文作者

Wang, Che, Luo, Xufang, Ross, Keith, Li, Dongsheng

论文摘要

我们提出了VRL3，这是一个强大的数据驱动框架，其简单设计用于解决挑战性的视觉深度强化学习（DRL）任务。我们分析了采用数据驱动方法的许多主要障碍，并提出了一系列设计原理，新颖的发现以及有关数据驱动的视觉DRL的关键见解。我们的框架有三个阶段：在第1阶段，我们利用非RL数据集（例如ImageNet）学习任务无关的视觉表示；在第2阶段，我们使用离线RL数据（例如，专家演示数量有限）将任务不合时宜的表示转换为更强大的特定任务表示；在第3阶段，我们用在线RL微调了代理商。与先前的SOTA相比，在一系列具有稀疏奖励和现实视觉输入的具有挑战性的手动操纵任务上，VRL3平均达到了780％的样品效率。在最艰巨的任务上，VRL3的样本有效效率高1220％（使用较宽的编码器时2440％），仅使用计算的10％来解决任务。这些重要的结果清楚地表明了数据驱动的深度强化学习的巨大潜力。

We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题