Applr：自适应计划者参数从加固中学习

论文标题

Applr：自适应计划者参数从加固中学习

APPLR: Adaptive Planner Parameter Learning from Reinforcement

论文作者

Xu, Zifan, Dhamankar, Gauraang, Nair, Anirudh, Xiao, Xuesu, Warnell, Garrett, Liu, Bo, Wang, Zizhao, Stone, Peter

论文摘要

经典导航系统通常使用固定的手工填充参数（例如最大速度，采样率，通货膨胀半径等）进行操作，并且需要重新调整大量的专家才能在新环境中工作。为了减轻这一要求，已提议使用通过远程操作收集的人类示范来学习新环境中不同环境的参数。但是，从人类的演示中学习将部署限制为培训环境，并将整体绩效限制为潜在的示威者。在本文中，我们介绍了Applr，自适应计划器参数从增强仪中学习，该参数允许现有导航系统通过使用通过强化学习（RL）在各种模拟环境中发现的参数选择方案来适应新方案。我们在模拟和物理实验中评估了Applr，并表明它可以胜过固定的手工调整参数，也可以胜过从人类演示中学到的动态参数调整方案。

Classical navigation systems typically operate using a fixed set of hand-picked parameters (e.g. maximum speed, sampling rate, inflation radius, etc.) and require heavy expert re-tuning in order to work in new environments. To mitigate this requirement, it has been proposed to learn parameters for different contexts in a new environment using human demonstrations collected via teleoperation. However, learning from human demonstration limits deployment to the training environment, and limits overall performance to that of a potentially-suboptimal demonstrator. In this paper, we introduce APPLR, Adaptive Planner Parameter Learning from Reinforcement, which allows existing navigation systems to adapt to new scenarios by using a parameter selection scheme discovered via reinforcement learning (RL) in a wide variety of simulation environments. We evaluate APPLR on a robot in both simulated and physical experiments, and show that it can outperform both a fixed set of hand-tuned parameters and also a dynamic parameter tuning scheme learned from human demonstration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题