monodtr：具有深度感知变压器的单眼3D对象检测

论文标题

monodtr：具有深度感知变压器的单眼3D对象检测

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

论文作者

Huang, Kuan-Chih, Wu, Tsung-Han, Su, Hung-Ting, Hsu, Winston H.

论文摘要

单程3D对象检测是自动驾驶中的一项重要但具有挑战性的任务。一些现有的方法利用了从现成的深度估计器中的深度信息来帮助3D检测，但是遭受了额外的计算负担，并实现了不准确的深度先验造成的有限性能。为了减轻这一点，我们提出了Monodtr，这是一种新型的端到端深度感知的变压器网络，用于单眼3D对象检测。它主要由两个组成部分组成：（1）深度感知功能增强（DFE）模块，该模块暗中通过辅助监督学习了深度感知的特征，而无需额外的计算，以及（2）深度感知的变压器（DTR）模块，该模块全球整合上下文和深度宣告功能。此外，与常规像素的位置编码不同，我们引入了一种新颖的深度位置编码（DPE），以将深度位置提示注入变压器。我们提出的深度感知模块可以轻松地插入现有的仅图像单眼3D对象探测器中，以改善性能。 Kitti数据集的广泛实验表明，我们的方法的表现优于先前的最新单眼方法，并实现实时检测。代码可从https://github.com/kuanchihhuang/monodtr获得

Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features. Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers. Our proposed depth-aware modules can be easily plugged into existing image-only monocular 3D object detectors to improve the performance. Extensive experiments on the KITTI dataset demonstrate that our approach outperforms previous state-of-the-art monocular-based methods and achieves real-time detection. Code is available at https://github.com/kuanchihhuang/MonoDTR

下载PDF全文

下载文献需遵守相关版权规定

论文标题