RORL：通过保守的平滑学习强大的离线加固学习

论文标题

RORL：通过保守的平滑学习强大的离线加固学习

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

论文作者

Yang, Rui, Bai, Chenjia, Ma, Xiaoteng, Wang, Zhaoran, Zhang, Chongjie, Han, Lei

论文摘要

离线加强学习（RL）为利用大量离线数据提供了一个有希望的方向来实现复杂的决策任务。由于分配转移问题，当前的离线RL算法通常被设计为在价值估计和行动选择方面是保守的。但是，这种保守主义在现实情况下遇到观察偏差（例如传感器错误和对抗性攻击）时会损害学习政策的鲁棒性。为了权衡鲁棒性和保守主义，我们通过一种新颖的保守平滑技术提出了强大的离线增强学习（RORL）。在RORL中，我们明确地介绍了数据集附近国家的策略和价值功能，以及对这些状态的其他保守价值估计。从理论上讲，我们表明Rorl比线性MDP中的最新理论结果更紧密地构成。我们证明，RORL可以在一般离线RL基准上实现最先进的性能，并且对对抗性观察扰动非常强大。

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题