控制流程：离线增强学习，无损原始发现

论文标题

控制流程：离线增强学习，无损原始发现

Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery

论文作者

Yang, Yiqin, Hu, Hao, Li, Wenzhe, Li, Siyuan, Yang, Jun, Zhao, Qianchuan, Zhang, Chongjie

论文摘要

离线增强学习（RL）使代理可以有效地从记录的数据中学习，从而大大扩展了RL算法在现实世界中的适用性，在现实世界中，探索可能是昂贵或不安全的。先前的作品表明，从记录数据中的重复和时间扩展结构中提取原始技能会产生更好的学习。但是，当原语的代表能力有限的能力恢复原始政策空间时，尤其是在离线设置中，这些方法很大。在本文中，我们对离线分层学习的表现进行定量表征，并强调学习无损原始人的重要性。为此，我们建议将基于\ emph {Flow}的结构用作低级策略的表示形式。这使我们能够忠实地表示数据集中的行为，同时保持表达能力恢复整个策略空间。我们表明，这种无损原始人可以大大改善分层政策的性能。对标准D4RL基准测试的实验结果和广泛的消融研究表明，我们的方法具有良好的策略能力，并且在大多数任务中都具有出色的表现。

Offline reinforcement learning (RL) enables the agent to effectively learn from logged data, which significantly extends the applicability of RL algorithms in real-world scenarios where exploration can be expensive or unsafe. Previous works have shown that extracting primitive skills from the recurring and temporally extended structures in the logged data yields better learning. However, these methods suffer greatly when the primitives have limited representation ability to recover the original policy space, especially in offline settings. In this paper, we give a quantitative characterization of the performance of offline hierarchical learning and highlight the importance of learning lossless primitives. To this end, we propose to use a \emph{flow}-based structure as the representation for low-level policies. This allows us to represent the behaviors in the dataset faithfully while keeping the expression ability to recover the whole policy space. We show that such lossless primitives can drastically improve the performance of hierarchical policies. The experimental results and extensive ablation studies on the standard D4RL benchmark show that our method has a good representation ability for policies and achieves superior performance in most tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题