蛋白石：脱机原始发现，用于加速离线加固学习

论文标题

蛋白石：脱机原始发现，用于加速离线加固学习

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

论文作者

Ajay, Anurag, Kumar, Aviral, Agrawal, Pulkit, Levine, Sergey, Nachum, Ofir

论文摘要

强化学习（RL）在各种在线环境中取得了令人印象深刻的表现，在这些环境中，代理商有效地查询环境是否有效，实际上是无限的。但是，在许多实际应用中，情况颠倒了：代理商可能可以访问大量无方向的离线体验数据，而对在线环境的访问受到严重限制。在这项工作中，我们专注于此离线设置。我们的主要洞察力是，当使用由多种行为组成的离线数据时，利用此数据的有效方法是在使用这些原始素进行下游任务学习之前提取连续的经常性和时间扩展的原始行为。以这种方式提取的原始词有两个目的：它们描述了没有数据的数据支持的行为，从而使它们有助于避免离线RL的分布变化；它们提供了一定程度的时间抽象，从而减少了有效的视野，从而在理论上获得了更好的学习，并在实践中改善了离线RL。除了使离线政策优化受益外，我们还表明，以这种方式进行离线原始学习也可以利用用于改善几乎没有射击的模仿学习，以及在在线RL中对各种基准测试领域的探索和转移。可视化可从https://sites.google.com/view/opal-iclr获得

Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited. However, in many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited. In this work, we focus on this offline setting. Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors before using these primitives for downstream task learning. Primitives extracted in this way serve two purposes: they delineate the behaviors that are supported by the data from those that are not, making them useful for avoiding distributional shift in offline RL; and they provide a degree of temporal abstraction, which reduces the effective horizon yielding better learning in theory, and improved offline RL in practice. In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning as well as exploration and transfer in online RL on a variety of benchmark domains. Visualizations are available at https://sites.google.com/view/opal-iclr

下载PDF全文

下载文献需遵守相关版权规定

论文标题