Bevdistill：多视图3D对象检测的跨模式BEV蒸馏

论文标题

Bevdistill：多视图3D对象检测的跨模式BEV蒸馏

BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection

论文作者

Chen, Zehui, Li, Zhenyu, Zhang, Shiquan, Fang, Liangji, Jiang, Qinhong, Zhao, Feng

论文摘要

来自多个图像视图的3D对象检测是视觉场景理解的一项基本且具有挑战性的任务。由于其低成本和高效率，多视图3D对象检测表明了有希望的应用程序前景。但是，由于缺乏深度信息，可以通过透视图准确地检测对象非常困难。当前的方法倾向于为图像编码器采用沉重的骨干，从而使其不适合实际部署。与图像不同，LIDAR点在提供空间提示方面表现出色，从而导致高度精确的定位。在本文中，我们探讨了用于多视图3D对象检测的基于激光雷达的检测器的结合。我们没有直接训练深度预测网络，而是统一了鸟眼（BEV）空间中的图像和激光元，并在教师范式中自适应地转移知识。为此，我们提出了\ textbf {bevdistill}，这是多视图3D对象检测的跨模式BEV知识蒸馏（KD）框架。广泛的实验表明，所提出的方法在高度竞争的基线Bevformer上优于当前的KD方法，而无需在推理阶段引入任何额外的成本。值得注意的是，我们最好的模型在Nuscenes测试排行榜上达到了59.4 ND，与各种基于图像的探测器相比，获得了新的最新最先进。代码将在https://github.com/zehuichen123/bevdistill上找到。

3D object detection from multiple image views is a fundamental and challenging task for visual scene understanding. Owing to its low cost and high efficiency, multi-view 3D object detection has demonstrated promising application prospects. However, accurately detecting objects through perspective views is extremely difficult due to the lack of depth information. Current approaches tend to adopt heavy backbones for image encoders, making them inapplicable for real-world deployment. Different from the images, LiDAR points are superior in providing spatial cues, resulting in highly precise localization. In this paper, we explore the incorporation of LiDAR-based detectors for multi-view 3D object detection. Instead of directly training a depth prediction network, we unify the image and LiDAR features in the Bird-Eye-View (BEV) space and adaptively transfer knowledge across non-homogenous representations in a teacher-student paradigm. To this end, we propose \textbf{BEVDistill}, a cross-modal BEV knowledge distillation (KD) framework for multi-view 3D object detection. Extensive experiments demonstrate that the proposed method outperforms current KD approaches on a highly-competitive baseline, BEVFormer, without introducing any extra cost in the inference phase. Notably, our best model achieves 59.4 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various image-based detectors. Code will be available at https://github.com/zehuichen123/BEVDistill.

下载PDF全文

下载文献需遵守相关版权规定

论文标题