从单眼相机流动的3D场景的无监督学习

论文标题

从单眼相机流动的3D场景的无监督学习

Unsupervised Learning of 3D Scene Flow from Monocular Camera

论文作者

Wang, Guangming, Tian, Xiaoyu, Ding, Ruiqi, Wang, Hesheng

论文摘要

场景流表示3D空间中点的运动，这是代表2D图像中像素运动的光流的对应物。但是，很难在真实场景中获得场景流的基础真相，并且最近的研究基于培训的合成数据。因此，如何基于实际数据训练场景流网络具有无监督的方法表现出至关重要的意义。本文提出了一种针对场景流的新颖无监督的学习方法，该方法利用了单眼相机连续的两个帧的图像，而没有场景流的地面真相进行训练。我们的方法实现了一个目标，即训练场景流通过现实世界数据弥合了训练数据和测试数据之间的差距，并扩大了可用数据的范围以进行培训。无监督的场景流程在本文中的学习主要由两个部分组成：（i）深度估计和摄像头姿势估计，以及（ii）基于四个不同损失函数的场景流量估计。深度估计和相机姿势估计获得了两个连续帧之间的深度图和摄像头姿势，这为下一个场景流估计提供了更多信息。之后，我们使用了深度一致性损失，动态静态一致性损失，倒角损失和拉普拉斯正规化损失来对场景流网络进行无监督的训练。据我们所知，这是第一篇意识到从单眼相机流动的3D场景流动的无监督学习的论文。 Kitti上的实验结果表明，与传统方法迭代最接近点（ICP）和快速全球注册（FGR）相比，我们无监督学习场景学习的方法符合表现出色。源代码可在以下网址提供：https：//github.com/irmvlab/3dunmonoflow。

Scene flow represents the motion of points in the 3D space, which is the counterpart of the optical flow that represents the motion of pixels in the 2D image. However, it is difficult to obtain the ground truth of scene flow in the real scenes, and recent studies are based on synthetic data for training. Therefore, how to train a scene flow network with unsupervised methods based on real-world data shows crucial significance. A novel unsupervised learning method for scene flow is proposed in this paper, which utilizes the images of two consecutive frames taken by monocular camera without the ground truth of scene flow for training. Our method realizes the goal that training scene flow network with real-world data, which bridges the gap between training data and test data and broadens the scope of available data for training. Unsupervised learning of scene flow in this paper mainly consists of two parts: (i) depth estimation and camera pose estimation, and (ii) scene flow estimation based on four different loss functions. Depth estimation and camera pose estimation obtain the depth maps and camera pose between two consecutive frames, which provide further information for the next scene flow estimation. After that, we used depth consistency loss, dynamic-static consistency loss, Chamfer loss, and Laplacian regularization loss to carry out unsupervised training of the scene flow network. To our knowledge, this is the first paper that realizes the unsupervised learning of 3D scene flow from monocular camera. The experiment results on KITTI show that our method for unsupervised learning of scene flow meets great performance compared to traditional methods Iterative Closest Point (ICP) and Fast Global Registration (FGR). The source code is available at: https://github.com/IRMVLab/3DUnMonoFlow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题