谨慎的贝叶斯优化，以进行高效且可扩展的政策搜索

论文标题

谨慎的贝叶斯优化，以进行高效且可扩展的政策搜索

Cautious Bayesian Optimization for Efficient and Scalable Policy Search

论文作者

Fröhlich, Lukas P., Zeilinger, Melanie N., Klenske, Edgar D.

论文摘要

样本效率是将策略搜索应用于现实世界问题时的关键因素之一。近年来，贝叶斯优化（BO）由于其样本效率和所需的先验知识而在机器人技术领域变得突出。但是，BO的一个缺点是它在高度搜索空间上的表现不佳，因为它着重于全球搜索。在策略搜索设置中，本地优化通常足够，因为通常可以使用初始策略，例如，通过元学习，动力学演示或SIM卡对实现方法。在本文中，我们建议将策略搜索空间限制为贝叶斯替代模型的预测不确定性的一级。这种约束策略更新的简单而有效的方法使BO可以扩展到高维空间（> 100），并降低了损坏系统的风险。我们证明了方法在各种问题上的有效性，包括运动技能任务，将深入的RL代理适应新的奖励信号，以及用于倒置摆系统的SIM对完成任务。

Sample efficiency is one of the key factors when applying policy search to real-world problems. In recent years, Bayesian Optimization (BO) has become prominent in the field of robotics due to its sample efficiency and little prior knowledge needed. However, one drawback of BO is its poor performance on high-dimensional search spaces as it focuses on global search. In the policy search setting, local optimization is typically sufficient as initial policies are often available, e.g., via meta-learning, kinesthetic demonstrations or sim-to-real approaches. In this paper, we propose to constrain the policy search space to a sublevel-set of the Bayesian surrogate model's predictive uncertainty. This simple yet effective way of constraining the policy update enables BO to scale to high-dimensional spaces (>100) as well as reduces the risk of damaging the system. We demonstrate the effectiveness of our approach on a wide range of problems, including a motor skills task, adapting deep RL agents to new reward signals and a sim-to-real task for an inverted pendulum system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题