多模式流3D对象检测

论文标题

多模式流3D对象检测

Multi-modal Streaming 3D Object Detection

论文作者

Abdelfattah, Mazen, Yuan, Kaiwen, Wang, Z. Jane, Ward, Rabab

论文摘要

现代的自动驾驶汽车在很大程度上依赖机械激光雷达。当前的感知方法通常需要360°点云，随着激光雷达扫描方位角并获得连续的楔形切片，依次收集。全面扫描（〜100ms）的采集潜伏期可能导致过时的看法，这不利于安全操作。最近提出的流媒体感知作品直接处理激光片切片，并通过重复使用先前切片的特征来补偿切片的狭窄视野（FOV）。但是，这些作品都是基于单一模式的，并且需要过去的信息可能过时。同时，高频相机的图像可以支持流型模型，因为它们与LiDAR片相比提供了更大的FOV。但是，FOV中的这种差异使传感器融合复杂化。为了解决这一研究差距，我们提出了一个创新的摄像头流媒体3D对象检测框架，该框架使用相机图像而不是过去的LiDAR切片来提供最新，密集和广泛的上下文，以进行流媒体感知。所提出的方法的表现优于具有挑战性的Nuscenes基准的先前流媒体模型。它还表现优于强大的全扫描探测器，同时更快。我们的方法证明缺少相机图像，狭窄的LiDAR切片和小型相机劳动错误校准是可靠的。

Modern autonomous vehicles rely heavily on mechanical LiDARs for perception. Current perception methods generally require 360° point clouds, collected sequentially as the LiDAR scans the azimuth and acquires consecutive wedge-shaped slices. The acquisition latency of a full scan (~ 100ms) may lead to outdated perception which is detrimental to safe operation. Recent streaming perception works proposed directly processing LiDAR slices and compensating for the narrow field of view (FOV) of a slice by reusing features from preceding slices. These works, however, are all based on a single modality and require past information which may be outdated. Meanwhile, images from high-frequency cameras can support streaming models as they provide a larger FoV compared to a LiDAR slice. However, this difference in FoV complicates sensor fusion. To address this research gap, we propose an innovative camera-LiDAR streaming 3D object detection framework that uses camera images instead of past LiDAR slices to provide an up-to-date, dense, and wide context for streaming perception. The proposed method outperforms prior streaming models on the challenging NuScenes benchmark. It also outperforms powerful full-scan detectors while being much faster. Our method is shown to be robust to missing camera images, narrow LiDAR slices, and small camera-LiDAR miscalibration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题