DID-M3D：单眼3D对象检测的解耦实例深度

论文标题

DID-M3D：单眼3D对象检测的解耦实例深度

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

论文作者

Peng, Liang, Wu, Xiaopei, Yang, Zheng, Liu, Haifeng, Cai, Deng

论文摘要

单眼3D检测由于其低成本和设置简单性而引起了社区的广泛关注。它采用RGB图像作为输入，并预测3D空间中的3D框。最具挑战性的子任务在于实例深度估计。以前的工作通常使用直接估计方法。但是，在本文中，我们指出RGB图像的实例深度是非直觉的。它是通过视觉深度线索和实例属性线索的结合，因此很难在网络中直接学习。因此，我们建议将实例深度重新制定为实例视觉表面深度（视觉深度）和实例属性深度（属性深度）的组合。视觉深度与对象的外观和图像上的位置有关。相比之下，属性深度依赖于对象的固有属性，这些属性与图像上的对象仿射变换不变。相应地，我们将3D位置的不确定性分解为视觉深度不确定性和属性深度不确定性。通过结合不同类型的深度和相关的不确定性，我们可以获得最终的实例深度。此外，由于物理性质，单程3D检测中的数据增强通常受到限制，从而阻碍了性能的提高。根据提出的实例深度分解策略，我们可以减轻此问题。对Kitti进行了评估，我们的方法实现了新的最新结果，并且广泛的消融研究验证了我们方法中每个组件的有效性。这些代码在https://github.com/spengliang/did-m3d上发布。

Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity. It takes an RGB image as input and predicts 3D boxes in the 3D space. The most challenging sub-task lies in the instance depth estimation. Previous works usually use a direct estimation method. However, in this paper we point out that the instance depth on the RGB image is non-intuitive. It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). The visual depth is related to objects' appearances and positions on the image. By contrast, the attribute depth relies on objects' inherent attributes, which are invariant to the object affine transformation on the image. Correspondingly, we decouple the 3D location uncertainty into visual depth uncertainty and attribute depth uncertainty. By combining different types of depths and associated uncertainties, we can obtain the final instance depth. Furthermore, data augmentation in monocular 3D detection is usually limited due to the physical nature, hindering the boost of performance. Based on the proposed instance depth disentanglement strategy, we can alleviate this problem. Evaluated on KITTI, our method achieves new state-of-the-art results, and extensive ablation studies validate the effectiveness of each component in our method. The codes are released at https://github.com/SPengLiang/DID-M3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题