论文标题

monodtr:具有深度感知变压器的单眼3D对象检测

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

论文作者

Huang, Kuan-Chih, Wu, Tsung-Han, Su, Hung-Ting, Hsu, Winston H.

论文摘要

单程3D对象检测是自动驾驶中的一项重要但具有挑战性的任务。一些现有的方法利用了从现成的深度估计器中的深度信息来帮助3D检测,但是遭受了额外的计算负担,并实现了不准确的深度先验造成的有限性能。为了减轻这一点,我们提出了Monodtr,这是一种新型的端到端深度感知的变压器网络,用于单眼3D对象检测。它主要由两个组成部分组成:(1)深度感知功能增强(DFE)模块,该模块暗中通过辅助监督学习了深度感知的特征,而无需额外的计算,以及(2)深度感知的变压器(DTR)模块,该模块全球整合上下文和深度宣告功能。此外,与常规像素的位置编码不同,我们引入了一种新颖的深度位置编码(DPE),以将深度位置提示注入变压器。我们提出的深度感知模块可以轻松地插入现有的仅图像单眼3D对象探测器中,以改善性能。 Kitti数据集的广泛实验表明,我们的方法的表现优于先前的最新单眼方法,并实现实时检测。代码可从https://github.com/kuanchihhuang/monodtr获得

Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features. Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers. Our proposed depth-aware modules can be easily plugged into existing image-only monocular 3D object detectors to improve the performance. Extensive experiments on the KITTI dataset demonstrate that our approach outperforms previous state-of-the-art monocular-based methods and achieves real-time detection. Code is available at https://github.com/kuanchihhuang/MonoDTR

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源