VIP3D：通过3D代理查询的端到端视觉轨迹预测

论文标题

VIP3D：通过3D代理查询的端到端视觉轨迹预测

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

论文作者

Gu, Junru, Hu, Chenxu, Zhang, Tianyuan, Chen, Xuanyao, Wang, Yilun, Wang, Yue, Zhao, Hang

论文摘要

感知和预测是现有自主驾驶系统中的两个单独的模块。他们通过手工挑选功能（例如代理边界框和轨迹）相互交互。由于这种分离，预测作为下游模块，仅收到感知模块的有限信息。更糟糕的是，感知模块的错误可以传播和累积，从而对预测结果产生不利影响。在这项工作中，我们提出了VIP3D，这是一种基于查询的视觉轨迹预测管道，该管道利用了从原始视频中利用丰富的信息，以直接预测场景中代理的未来轨迹。 VIP3D在整个管道中采用稀疏的代理查询来检测，跟踪和预测，这使其成为第一个基于视觉的轨迹预测方法。代替使用历史特征地图和轨迹，而是在代理查询中编码了先前时间戳中的有用信息，这使VIP3D成为简洁的流媒体预测方法。此外，Nuscenes数据集的广泛实验结果表明，基于传统管道和以前的端到端模型，VIP3D的强烈预测性能。

Perception and prediction are two separate modules in the existing autonomous driving systems. They interact with each other via hand-picked features such as agent bounding boxes and trajectories. Due to this separation, prediction, as a downstream module, only receives limited information from the perception module. To make matters worse, errors from the perception modules can propagate and accumulate, adversely affecting the prediction results. In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene. ViP3D employs sparse agent queries to detect, track, and predict throughout the pipeline, making it the first fully differentiable vision-based trajectory prediction approach. Instead of using historical feature maps and trajectories, useful information from previous timestamps is encoded in agent queries, which makes ViP3D a concise streaming prediction method. Furthermore, extensive experimental results on the nuScenes dataset show the strong vision-based prediction performance of ViP3D over traditional pipelines and previous end-to-end models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题