论文标题

MV-FCOS3D ++:使用预验证的单眼骨架的多视摄像头4D对象检测

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

论文作者

Wang, Tai, Lian, Qing, Zhu, Chenming, Zhu, Xinge, Zhang, Wenwei

论文摘要

在这份技术报告中,我们介绍了我们的解决方案,称为MV-FCOS3D ++,用于Waymo Open DataSet Challenge中的仅相机3D检测轨道2022。仅观看摄像机仅3D检测,基于鸟类视图或3D几何代表的方法可以从相互策略的次要范围内进行3D几何形式,并直接从相互策划的次要范围内进行辅助范围。但是,它缺乏2D主链的直接语义监督,可以通过预处理简单的基于单眼的探测器来补充。我们的解决方案是此范式之后4D检测的多视图框架。它是基于简单的单眼检测器FCOS3D ++构建的,仅通过Waymo的对象注释进行了预定,并将多视图功能转换为3D网格空间以检测其上的3D对象。设计了单帧理解和时间立体声匹配的双路径颈部,以结合多帧信息。我们的方法最终通过单个模型实现了49.75%的MAPL,并在WOD挑战中赢得了第二名,而在训练过程中没有任何基于激光雷达的深度监督。该代码将在https://github.com/tai-wang/depth-from-motion上发布。

In this technical report, we present our solution, dubbed MV-FCOS3D++, for the Camera-Only 3D Detection track in Waymo Open Dataset Challenge 2022. For multi-view camera-only 3D detection, methods based on bird-eye-view or 3D geometric representations can leverage the stereo cues from overlapped regions between adjacent views and directly perform 3D detection without hand-crafted post-processing. However, it lacks direct semantic supervision for 2D backbones, which can be complemented by pretraining simple monocular-based detectors. Our solution is a multi-view framework for 4D detection following this paradigm. It is built upon a simple monocular detector FCOS3D++, pretrained only with object annotations of Waymo, and converts multi-view features to a 3D grid space to detect 3D objects thereon. A dual-path neck for single-frame understanding and temporal stereo matching is devised to incorporate multi-frame information. Our method finally achieves 49.75% mAPL with a single model and wins 2nd place in the WOD challenge, without any LiDAR-based depth supervision during training. The code will be released at https://github.com/Tai-Wang/Depth-from-Motion.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源