DXSLAM：具有深度特征的强大而有效的视觉大满贯系统

论文标题

DXSLAM：具有深度特征的强大而有效的视觉大满贯系统

DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features

论文作者

Li, Dongjiang, Shi, Xuesong, Long, Qiwei, Liu, Shenghui, Yang, Wei, Wang, Fangshi, Wei, Qi, Qiao, Fei

论文摘要

强大而有效的同时定位和映射（SLAM）系统对于机器人自治是必不可少的。对于视觉大满贯算法，尽管在大多数方面已经建立了理论框架，但在大多数情况下，特征提取和关联仍然是经验设计的，并且在复杂的环境中可能很容易受到影响。本文表明，具有深卷积神经网络（CNN）的特征提取可以无缝地融合到现代的SLAM框架中。所提出的SLAM系统利用最先进的CNN来检测每个图像框架中的关键点，不仅提供关键点描述符，还提供整个图像的全局描述符。然后，这些本地和全局功能由不同的猛击模块使用，与使用手工制作的功能相比，对环境变化和观点变化产生了更大的鲁棒性。我们还使用一袋单词（BOW）方法来训练当地特征的视觉词汇。根据本地功能，全局功能和词汇，建立了一种高度可靠的循环封闭检测方法。实验结果表明，所有提出的模块都显着胜过基线，并且完整的系统在所有评估的数据上达到了较低的轨迹误差和更高的正确率。此外，通过使用Intel OpenVino Toolkit优化CNN并利用快速弓库，该系统从现代CPU中的SIMD（单个Instruction-Multiple-Data）技术中受益匪浅。完整的系统可以实时运行，而无需任何GPU或其他加速器。该代码在https://github.com/ivipsourcecode/dxslam上公开。

A robust and efficient Simultaneous Localization and Mapping (SLAM) system is essential for robot autonomy. For visual SLAM algorithms, though the theoretical framework has been well established for most aspects, feature extraction and association is still empirically designed in most cases, and can be vulnerable in complex environments. This paper shows that feature extraction with deep convolutional neural networks (CNNs) can be seamlessly incorporated into a modern SLAM framework. The proposed SLAM system utilizes a state-of-the-art CNN to detect keypoints in each image frame, and to give not only keypoint descriptors, but also a global descriptor of the whole image. These local and global features are then used by different SLAM modules, resulting in much more robustness against environmental changes and viewpoint changes compared with using hand-crafted features. We also train a visual vocabulary of local features with a Bag of Words (BoW) method. Based on the local features, global features, and the vocabulary, a highly reliable loop closure detection method is built. Experimental results show that all the proposed modules significantly outperforms the baseline, and the full system achieves much lower trajectory errors and much higher correct rates on all evaluated data. Furthermore, by optimizing the CNN with Intel OpenVINO toolkit and utilizing the Fast BoW library, the system benefits greatly from the SIMD (single-instruction-multiple-data) techniques in modern CPUs. The full system can run in real-time without any GPU or other accelerators. The code is public at https://github.com/ivipsourcecode/dxslam.

下载PDF全文

下载文献需遵守相关版权规定

论文标题