论文标题
多模式流3D对象检测
Multi-modal Streaming 3D Object Detection
论文作者
论文摘要
现代的自动驾驶汽车在很大程度上依赖机械激光雷达。当前的感知方法通常需要360°点云,随着激光雷达扫描方位角并获得连续的楔形切片,依次收集。全面扫描(〜100ms)的采集潜伏期可能导致过时的看法,这不利于安全操作。最近提出的流媒体感知作品直接处理激光片切片,并通过重复使用先前切片的特征来补偿切片的狭窄视野(FOV)。但是,这些作品都是基于单一模式的,并且需要过去的信息可能过时。同时,高频相机的图像可以支持流型模型,因为它们与LiDAR片相比提供了更大的FOV。但是,FOV中的这种差异使传感器融合复杂化。为了解决这一研究差距,我们提出了一个创新的摄像头流媒体3D对象检测框架,该框架使用相机图像而不是过去的LiDAR切片来提供最新,密集和广泛的上下文,以进行流媒体感知。所提出的方法的表现优于具有挑战性的Nuscenes基准的先前流媒体模型。它还表现优于强大的全扫描探测器,同时更快。我们的方法证明缺少相机图像,狭窄的LiDAR切片和小型相机劳动错误校准是可靠的。
Modern autonomous vehicles rely heavily on mechanical LiDARs for perception. Current perception methods generally require 360° point clouds, collected sequentially as the LiDAR scans the azimuth and acquires consecutive wedge-shaped slices. The acquisition latency of a full scan (~ 100ms) may lead to outdated perception which is detrimental to safe operation. Recent streaming perception works proposed directly processing LiDAR slices and compensating for the narrow field of view (FOV) of a slice by reusing features from preceding slices. These works, however, are all based on a single modality and require past information which may be outdated. Meanwhile, images from high-frequency cameras can support streaming models as they provide a larger FoV compared to a LiDAR slice. However, this difference in FoV complicates sensor fusion. To address this research gap, we propose an innovative camera-LiDAR streaming 3D object detection framework that uses camera images instead of past LiDAR slices to provide an up-to-date, dense, and wide context for streaming perception. The proposed method outperforms prior streaming models on the challenging NuScenes benchmark. It also outperforms powerful full-scan detectors while being much faster. Our method is shown to be robust to missing camera images, narrow LiDAR slices, and small camera-LiDAR miscalibration.