学习层次结构行为和运动计划

论文标题

学习层次结构行为和运动计划

Learning hierarchical behavior and motion planning for autonomous driving

论文作者

Wang, Jingke, Wang, Yue, Zhang, Dongkun, Yang, Yezhou, Xiong, Rong

论文摘要

基于学习的驾驶解决方案是一个新的自主驾驶分支，有望通过从数据中学习基本机制来简化驾驶的建模。为了改善基于学习的驾驶解决方案的战术决策，我们介绍了分层行为和运动计划（HBMP），以明确对基于学习的解决方案的行为进行建模。由于行为和运动的耦合动作空间，解决长途驾驶任务的增强学习（RL）解决HBMP问题是一项挑战。我们通过整合基于经典抽样的运动计划者来改变HBMP问题，其中最佳成本被视为高级行为学习的奖励。结果，这种配方会降低动作空间并多样化的奖励而不会失去HBMP的最佳性。此外，我们为跨模拟平台和现实环境提出了一个可共享的表示，以用于输入感官数据，以便可以使用基于快速事件的模拟器Sumo进行训练的模型，可用于初始化和加速基于动态的模拟器Carla中的RL训练。实验结果证明了该方法的有效性。此外，该模型已成功地转移到现实世界中，从而验证了概括能力。

Learning-based driving solution, a new branch for autonomous driving, is expected to simplify the modeling of driving by learning the underlying mechanisms from data. To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution. Due to the coupled action space of behavior and motion, it is challenging to solve HBMP problem using reinforcement learning (RL) for long-horizon driving tasks. We transform HBMP problem by integrating a classical sampling-based motion planner, of which the optimal cost is regarded as the rewards for high-level behavior learning. As a result, this formulation reduces action space and diversifies the rewards without losing the optimality of HBMP. In addition, we propose a sharable representation for input sensory data across simulation platforms and real-world environment, so that models trained in a fast event-based simulator, SUMO, can be used to initialize and accelerate the RL training in a dynamics based simulator, CARLA. Experimental results demonstrate the effectiveness of the method. Besides, the model is successfully transferred to the real-world, validating the generalization capability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题