MV-FCOS3D ++：使用预验证的单眼骨架的多视摄像头4D对象检测

论文标题

MV-FCOS3D ++：使用预验证的单眼骨架的多视摄像头4D对象检测

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

论文作者

Wang, Tai, Lian, Qing, Zhu, Chenming, Zhu, Xinge, Zhang, Wenwei

论文摘要

在这份技术报告中，我们介绍了我们的解决方案，称为MV-FCOS3D ++，用于Waymo Open DataSet Challenge中的仅相机3D检测轨道2022。仅观看摄像机仅3D检测，基于鸟类视图或3D几何代表的方法可以从相互策略的次要范围内进行3D几何形式，并直接从相互策划的次要范围内进行辅助范围。但是，它缺乏2D主链的直接语义监督，可以通过预处理简单的基于单眼的探测器来补充。我们的解决方案是此范式之后4D检测的多视图框架。它是基于简单的单眼检测器FCOS3D ++构建的，仅通过Waymo的对象注释进行了预定，并将多视图功能转换为3D网格空间以检测其上的3D对象。设计了单帧理解和时间立体声匹配的双路径颈部，以结合多帧信息。我们的方法最终通过单个模型实现了49.75％的MAPL，并在WOD挑战中赢得了第二名，而在训练过程中没有任何基于激光雷达的深度监督。该代码将在https://github.com/tai-wang/depth-from-motion上发布。

In this technical report, we present our solution, dubbed MV-FCOS3D++, for the Camera-Only 3D Detection track in Waymo Open Dataset Challenge 2022. For multi-view camera-only 3D detection, methods based on bird-eye-view or 3D geometric representations can leverage the stereo cues from overlapped regions between adjacent views and directly perform 3D detection without hand-crafted post-processing. However, it lacks direct semantic supervision for 2D backbones, which can be complemented by pretraining simple monocular-based detectors. Our solution is a multi-view framework for 4D detection following this paradigm. It is built upon a simple monocular detector FCOS3D++, pretrained only with object annotations of Waymo, and converts multi-view features to a 3D grid space to detect 3D objects thereon. A dual-path neck for single-frame understanding and temporal stereo matching is devised to incorporate multi-frame information. Our method finally achieves 49.75% mAPL with a single model and wins 2nd place in the WOD challenge, without any LiDAR-based depth supervision during training. The code will be released at https://github.com/Tai-Wang/Depth-from-Motion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题