单眼3D对象检测，结构化多边形估计和高度引导的深度估计

论文标题

单眼3D对象检测，结构化多边形估计和高度引导的深度估计

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

论文作者

Cai, Yingjie, Li, Buyu, Jiao, Zeyu, Li, Hongsheng, Zeng, Xingyu, Wang, Xiaogang

论文摘要

单眼3D对象检测任务旨在根据单眼RGB图像预测对象的3D边界框。由于由于缺乏深度信息，因此很难在3D空间中的位置恢复，因此本文提出了一个新型的统一框架，将检测问题分解为结构化多边形预测任务和深度恢复任务。与经过广泛研究的2D边界框不同，2D图像中提出的新型结构化多边形由目标对象的几个投影表面组成。与广泛使用的3D边界框建议相比，它被证明是3D检测的更好表示。为了成型地将预测的2D结构化多边形投射到3D物理世界中的Cuboid，以下深度恢复任务在使用给定的摄像头投影矩阵完成逆向投影转换之前使用对象高度。此外，提出了一种细粒的3D盒改进方案，以进一步纠正3D检测结果。实验是在具有挑战性的Kitti基准测试中进行的，其中我们的方法可以达到最新的检测准确性。

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题