多个不确定性集的强大政策学习

论文标题

多个不确定性集的强大政策学习

Robust Policy Learning over Multiple Uncertainty Sets

论文作者

Xie, Annie, Sodhani, Shagun, Finn, Chelsea, Pineau, Joelle, Zhang, Amy

论文摘要

强化学习（RL）代理对于在关键 - 关键环境中的变化需要鲁棒。尽管系统识别方法提供了一种推断在线体验的变化的方法，但它们可能会在无法快速识别的设置中失败。另一种主要方法是强大的RL，它会产生可以处理最坏情况的策略，但是这些方法通常旨在实现对单个不确定性集必须在火车时指定的单个不确定性集的稳健性。对于更通用的解决方案，我们制定了多集鲁棒性问题，以了解对不同扰动集的策略。然后，我们设计了一种享受系统识别和鲁棒RL的好处的算法：在可能的情况下，它会在可能的情况下降低不确定性，但仍可以在其余的不确定性方面强烈行动。在一组各种控制任务上，与基于系统识别和仅基于强大的RL的先前方法相比，我们的方法在新环境上表现出了改善的最差案例性能。

Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.

下载PDF全文

下载文献需遵守相关版权规定

论文标题