有条件的预测行为计划，并通过逆增强学习对人类的自主驾驶

论文标题

有条件的预测行为计划，并通过逆增强学习对人类的自主驾驶

Conditional Predictive Behavior Planning with Inverse Reinforcement Learning for Human-like Autonomous Driving

论文作者

Huang, Zhiyu, Liu, Haochen, Wu, Jingda, Lv, Chen

论文摘要

做出安全和人类的决策是自主驾驶系统的重要能力，基于学习的行为计划为实现这一目标提供了有希望的途径。这项工作与直接输出决策的现有基于学习的方法区分开来，引入了一个预测行为计划框架，该框架学会了从人类驾驶数据中进行预测和评估。该框架由三个组成部分组成：一个行为产生模块，该模块以轨迹提案的形式产生各种候选行为，这是一个有条件的运动预测网络，该网络可根据每个提案预测其他试剂的未来轨迹，以及一个得分模块，可以使用最大的entry enterpy Interpy Intrypory逆向强化学习（IRL）评估候选计划。我们通过全面的实验验证了大型现实世界中城市驾驶数据集的拟议框架。结果表明，有条件的预测模型可以预测不同的轨迹提案的不同且合理的未来轨迹，而基于IRL的评分模块可以选择接近人类驾驶的计划。所提出的框架在与人类驾驶轨迹的相似性方面优于其他基线方法。此外，我们发现条件预测模型与非条件模型相比提高了预测和计划绩效。最后，我们注意到，学习评分模块对于使评估与人类驱动因素保持一致至关重要。

Making safe and human-like decisions is an essential capability of autonomous driving systems, and learning-based behavior planning presents a promising pathway toward achieving this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. This framework consists of three components: a behavior generation module that produces a diverse set of candidate behaviors in the form of trajectory proposals, a conditional motion prediction network that predicts future trajectories of other agents based on each proposal, and a scoring module that evaluates the candidate plans using maximum entropy inverse reinforcement learning (IRL). We validate the proposed framework on a large-scale real-world urban driving dataset through comprehensive experiments. The results show that the conditional prediction model can predict distinct and reasonable future trajectories given different trajectory proposals and the IRL-based scoring module can select plans that are close to human driving. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Additionally, we find that the conditional prediction model improves both prediction and planning performance compared to the non-conditional model. Lastly, we note that learning the scoring module is crucial for aligning the evaluations with human drivers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题