3D对象检测的密集体素融合

论文标题

3D对象检测的密集体素融合

Dense Voxel Fusion for 3D Object Detection

论文作者

Mahmoud, Anas, Hu, Jordan S. K., Waslander, Steven L.

论文摘要

相机和激光雷达传感器的方式提供了互补的外观和几何信息，可用于检测3D对象的自动驾驶汽车应用。但是，当前的端到端融合方法在训练和表现不佳的最先进激光雷达探测器方面具有挑战性。顺序融合方法由于点云稀疏性而导致的像素数量有限，点对应关系，或者它们的性能严格由其中一种模态的检测限制。我们提出的解决方案密集的体素融合（DVF）是一种顺序融合方法，它会生成多尺度密度的体素特征表示，从而提高了低点密度区域的表现力。为了增强多模式学习，我们直接使用投影的地面真相3D边界框标签进行训练，避免了嘈杂的，探测器特定的2D预测。 DVF和多模式训练方法均可应用于任何基于体素的激光束。 DVF在Kitti 3D CAR检测基准的发表融合方法中排名第三，而无需引入其他可训练的参数，也不需要立体声图像或密集的深度标签。此外，DVF显着提高了Waymo Open数据集上基于体素方法的3D车辆检测性能。

Camera and LiDAR sensor modalities provide complementary appearance and geometric information useful for detecting 3D objects for autonomous vehicle applications. However, current end-to-end fusion methods are challenging to train and underperform state-of-the-art LiDAR-only detectors. Sequential fusion methods suffer from a limited number of pixel and point correspondences due to point cloud sparsity, or their performance is strictly capped by the detections of one of the modalities. Our proposed solution, Dense Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations, improving expressiveness in low point density regions. To enhance multi-modal learning, we train directly with projected ground truth 3D bounding box labels, avoiding noisy, detector-specific 2D predictions. Both DVF and the multi-modal training approach can be applied to any voxel-based LiDAR backbone. DVF ranks 3rd among published fusion methods on KITTI 3D car detection benchmark without introducing additional trainable parameters, nor requiring stereo images or dense depth labels. In addition, DVF significantly improves 3D vehicle detection performance of voxel-based methods on the Waymo Open Dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题