论文标题
Deviant:单眼3D对象检测的深度模糊网络
DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection
论文作者
论文摘要
现代神经网络使用构建块,例如与任意2D翻译一样的卷积。但是,这些香草块并不等于投影歧管中的任意3D翻译。即便如此,所有单眼3D检测器都使用香草块来获得3D坐标,这是一项不适合香草块的任务。本文迈出了朝着探索综合式的卷积归档到投影歧管中的任意3D翻译的第一步。由于深度是最难估计的单眼检测,因此本文提出了具有现有规模的超模量的可通道块的深度模棱两可的网络(deviant)。结果,Deviant等于射影歧管中的深度翻译,而香草网络却没有。额外的深度均衡力迫使偏差学习一致的深度估计,因此,越来越多的人在纯图像类别的Kitti和Waymo数据集上实现了最新的单眼3D检测结果,并使用额外信息竞争地对方法进行了竞争性执行。此外,在跨数据库评估中,异常比香草网络更好。 https://github.com/abhi1kumar/deviant的代码和模型
Modern neural networks use building blocks such as convolutions that are equivariant to arbitrary 2D translations. However, these vanilla blocks are not equivariant to arbitrary 3D translations in the projective manifold. Even then, all monocular 3D detectors use vanilla blocks to obtain the 3D coordinates, a task for which the vanilla blocks are not designed for. This paper takes the first step towards convolutions equivariant to arbitrary 3D translations in the projective manifold. Since the depth is the hardest to estimate for monocular detection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built with existing scale equivariant steerable blocks. As a result, DEVIANT is equivariant to the depth translations in the projective manifold whereas vanilla networks are not. The additional depth equivariance forces the DEVIANT to learn consistent depth estimates, and therefore, DEVIANT achieves state-of-the-art monocular 3D detection results on KITTI and Waymo datasets in the image-only category and performs competitively to methods using extra information. Moreover, DEVIANT works better than vanilla networks in cross-dataset evaluation. Code and models at https://github.com/abhi1kumar/DEVIANT