空城市：视觉大满贯的动态对象不变空间

论文标题

空城市：视觉大满贯的动态对象不变空间

Empty Cities: a Dynamic-Object-Invariant Space for Visual SLAM

论文作者

Bescos, Berta, Cadena, Cesar, Neira, Jose

论文摘要

在本文中，我们提出了一种数据驱动的方法来获取场景的静态图像，消除了在用相机穿越场景时可能存在的动态对象。一般目标是在动态环境中改善基于视觉的本地化和映射任务，在这种环境中，在不同时刻，不同动态对象的存在（或不存在）会使这些任务变得不强大。我们介绍了一个端到端的深度学习框架，以将包括动态内容（例如车辆或行人）的城市环境的图像转变为适合本地化和映射的现实静态框架。这个目标面临两个主要挑战：检测动态对象，并介绍静态遮挡的后台。第一个挑战是通过使用卷积网络来解决图像的多级语义分割。第二个挑战是通过生成对抗模型来应对的，该模型将原始动态图像和计算的动态/静态二进制掩码作为输入，能够生成最终的静态图像。该框架利用了两个新的损失，一种基于图像切解技术，可用于提高含量的质量，另一个基于ORB功能，旨在增强真实图像区域和幻觉图像区域之间的特征匹配。为了验证我们的方法，我们对受动态实体的影响，即视觉镜头，放置识别和多视图立体声的不同任务进行了广泛的评估，并具有幻觉图像。代码已在https://github.com/bertabescos/emptycities_slam上提供。

In this paper we present a data-driven approach to obtain the static image of a scene, eliminating dynamic objects that might have been present at the time of traversing the scene with a camera. The general objective is to improve vision-based localization and mapping tasks in dynamic environments, where the presence (or absence) of different dynamic objects in different moments makes these tasks less robust. We introduce an end-to-end deep learning framework to turn images of an urban environment that include dynamic content, such as vehicles or pedestrians, into realistic static frames suitable for localization and mapping. This objective faces two main challenges: detecting the dynamic objects, and inpainting the static occluded back-ground. The first challenge is addressed by the use of a convolutional network that learns a multi-class semantic segmentation of the image. The second challenge is approached with a generative adversarial model that, taking as input the original dynamic image and the computed dynamic/static binary mask, is capable of generating the final static image. This framework makes use of two new losses, one based on image steganalysis techniques, useful to improve the inpainting quality, and another one based on ORB features, designed to enhance feature matching between real and hallucinated image regions. To validate our approach, we perform an extensive evaluation on different tasks that are affected by dynamic entities, i.e., visual odometry, place recognition and multi-view stereo, with the hallucinated images. Code has been made available on https://github.com/bertabescos/EmptyCities_SLAM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题