学习使用模型不可知的元学习切换CNN，以进行精细的精确视觉伺服

论文标题

学习使用模型不可知的元学习切换CNN，以进行精细的精确视觉伺服

Learning to Switch CNNs with Model Agnostic Meta Learning for Fine Precision Visual Servoing

论文作者

Raj, Prem, Namboodiri, Vinay P., Behera, L.

论文摘要

卷积神经网络（CNN）已成功地用于从标记的图像对数据中进行相对摄像机姿势估计，而无需任何手工设计的功能，摄像机固有参数或深度信息。训练有素的CNN可用于执行基于姿势的视觉伺服控制（PBV）。提高视觉伺服输出质量的方法之一是提高CNN的准确性来估计相对姿势估计。对于给定的最新CNN，用于相对姿势回归，我们如何才能获得改进的视觉伺服器控制性能？在本文中，我们探讨了CNN的切换以提高视觉伺服控制的精度。切换CNN的想法是由于训练相对摄像头姿势回归器的视觉伺服控制的数据集必须包含相对姿势的变化，范围从很小的规模到最终更大规模。我们发现，培训CNN的两个不同实例，一个用于大规模分位（LSD），另一个用于小规模分位（SSD），并在Visual Servo执行过程中切换它们比培训单个CNN的结果更好，从而获得更好的结果。但是，它会导致额外的存储开销，而切换决策是由手动设定的阈值做出的，该阈值可能并非最适合所有场景。为了消除这些缺点，我们提出了一种基于模型不可知的元学习（MAML）算法的有效切换策略。在此，训练了一个模型来学习参数，该参数同时适用于多个任务，即切换决策的二进制分类，LSD数据的6DOF姿势回归以及SSD数据的6DOF姿势回归。所提出的方法的表现远远远胜于幼稚的方法，而存储时间和运行时间的开销几乎可以忽略不计。

Convolutional Neural Networks (CNNs) have been successfully applied for relative camera pose estimation from labeled image-pair data, without requiring any hand-engineered features, camera intrinsic parameters or depth information. The trained CNN can be utilized for performing pose based visual servo control (PBVS). One of the ways to improve the quality of visual servo output is to improve the accuracy of the CNN for estimating the relative pose estimation. With a given state-of-the-art CNN for relative pose regression, how can we achieve an improved performance for visual servo control? In this paper, we explore switching of CNNs to improve the precision of visual servo control. The idea of switching a CNN is due to the fact that the dataset for training a relative camera pose regressor for visual servo control must contain variations in relative pose ranging from a very small scale to eventually a larger scale. We found that, training two different instances of the CNN, one for large-scale-displacements (LSD) and another for small-scale-displacements (SSD) and switching them during the visual servo execution yields better results than training a single CNN with the combined LSD+SSD data. However, it causes extra storage overhead and switching decision is taken by a manually set threshold which may not be optimal for all the scenes. To eliminate these drawbacks, we propose an efficient switching strategy based on model agnostic meta learning (MAML) algorithm. In this, a single model is trained to learn parameters which are simultaneously good for multiple tasks, namely a binary classification for switching decision, a 6DOF pose regression for LSD data and also a 6DOF pose regression for SSD data. The proposed approach performs far better than the naive approach, while storage and run-time overheads are almost negligible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题