探索离线增强学习的挑战

论文标题

探索离线增强学习的挑战

The Challenges of Exploration for Offline Reinforcement Learning

论文作者

Lambert, Nathan, Wulfmeier, Markus, Whitney, William, Byravan, Arunkumar, Bloesch, Michael, Dasagi, Vibhavari, Hertweck, Tim, Riedmiller, Martin

论文摘要

离线增强学习（ORL）使得索能分别研究增强学习的两个相互联系的过程：收集信息丰富的经验和推断最佳行为。第二步在离线设置中进行了广泛的研究，但是对数据有效的RL至关重要的是信息性数据的收集。数据收集的任务无关设置，该任务尚不清楚，这特别是由于收集单个数据集并在出现的几个下游任务中求解了几个下游任务，因此特别感兴趣。我们通过基于好奇心的内在动机进行调查，这是一种探索方法，鼓励代理商探索那些状态或尚未学会建模的状态或过渡。借助Explore2Offline，我们建议通过传输收集的数据并使用奖励重新定货和标准离线RL算法来评估收集数据的质量。我们使用该方案评估了各种数据收集策略，包括新的探索代理，内在的模型预测控制（IMPC），并证明其在各种任务上的表现。我们使用这个脱钩框架来加强有关探索和有效离线RL的数据先决条件的直觉。

Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest due to the possibility of collecting a single dataset and using it to solve several downstream tasks as they arise. We investigate this setting via curiosity-based intrinsic motivation, a family of exploration methods which encourage the agent to explore those states or transitions it has not yet learned to model. With Explore2Offline, we propose to evaluate the quality of collected data by transferring the collected data and inferring policies with reward relabelling and standard offline RL algorithms. We evaluate a wide variety of data collection strategies, including a new exploration agent, Intrinsic Model Predictive Control (IMPC), using this scheme and demonstrate their performance on various tasks. We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题