深度摄像头姿势回归使用伪劳动

论文标题

深度摄像头姿势回归使用伪劳动

Deep Camera Pose Regression Using Pseudo-LiDAR

论文作者

Raza, Ali, Lolic, Lazar, Akhter, Shahmir, Cruz, Alfonso Dela, Liut, Michael

论文摘要

准确且稳健的大规模定位系统是主动研究领域（例如自动驾驶汽车和增强现实）的组成部分。为此，已经提出了许多从RGB或RGB-D图像预测6DOF相机姿势的学习算法。但是，以前包含深度的方法通常以与RGB图像相同的方式处理数据，通常将深度图作为其他通道添加到RGB图像中并通过卷积神经网络（CNN）传递。在本文中，我们表明将深度图转换为伪LIDAR信号，以前证明对3D对象检测有用，是通过投影可以准确确定6DOF相机姿势的点云来更好地表示相机本地化任务。首先比较专门在伪LIDAR表示上运行的网络的本地化精度来证明这一点，而网络仅在深度图上运行。然后，我们提出了FusionLoc，这是一种新型的架构，它使用伪LIDAR来恢复6DOF相机姿势。 FusionLoc是一个双流神经网络，旨在补救在RGB-D图像上运行的典型2D CNN的常见问题。使用7个场景数据集将该体系结构的结果与其他各种其他最先进的深层回归实现进行了比较。发现FusionLoc的性能要比许多其他摄像机定位方法更好，并且显着的改善平均比RGB-D Posenet更准确，而4.35°精确。通过证明在深度图上使用伪LIDAR信号进行本地化的有效性，在实施大规模定位系统时会有一些新的注意事项。

An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps as additional channels to RGB images and passing them through convolutional neural networks (CNNs). In this paper, we show that converting depth maps into pseudo-LiDAR signals, previously shown to be useful for 3D object detection, is a better representation for camera localization tasks by projecting point clouds that can accurately determine 6DOF camera pose. This is demonstrated by first comparing localization accuracies of a network operating exclusively on pseudo-LiDAR representations, with networks operating exclusively on depth maps. We then propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose. FusionLoc is a dual stream neural network, which aims to remedy common issues with typical 2D CNNs operating on RGB-D images. The results from this architecture are compared against various other state-of-the-art deep pose regression implementations using the 7 Scenes dataset. The findings are that FusionLoc performs better than a number of other camera localization methods, with a notable improvement being, on average, 0.33m and 4.35° more accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR signals over depth maps for localization, there are new considerations when implementing large-scale localization systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题