论文标题
DFBV:基于功能的深度视觉伺服器
DFBVS: Deep Feature-Based Visual Servo
论文作者
论文摘要
经典的视觉宣传(VS)依赖手工的视觉特征,从而限制了它们的普遍性。最近,已经提出了许多基于深神经网络的方法,通过直接比较整个目标和当前的相机图像来克服此限制。但是,通过完全摆脱视觉特征,这些方法要求目标和当前图像基本相似,这排除了对未知,混乱的场景的概括。在这里,我们建议像经典与方法一样基于视觉特征执行VS,但与后者相反,我们利用了最新的深度学习突破来自动提取和匹配视觉特征。通过这样做,我们的方法享有两全其美的优势:(i)由于我们的方法基于视觉特征,因此即使在背景中有明显的分心,它也能够将机器人引导到感兴趣的对象; (ii)由于功能会自动提取和匹配,因此我们的方法可以轻松自动概括为看不见的对象和场景。此外,我们建议使用渲染引擎合成目标图像,从而提供进一步的概括。我们在机器人握把任务中证明了这些优势,在该任务中,机器人能够以很高的精度转向对象掌握的对象,这简单地基于从相机视图中呈现的对象的图像,与所需的机器人握住姿势相对应。
Classical Visual Servoing (VS) rely on handcrafted visual features, which limit their generalizability. Recently, a number of approaches, some based on Deep Neural Networks, have been proposed to overcome this limitation by comparing directly the entire target and current camera images. However, by getting rid of the visual features altogether, those approaches require the target and current images to be essentially similar, which precludes the generalization to unknown, cluttered, scenes. Here we propose to perform VS based on visual features as in classical VS approaches but, contrary to the latter, we leverage recent breakthroughs in Deep Learning to automatically extract and match the visual features. By doing so, our approach enjoys the advantages from both worlds: (i) because our approach is based on visual features, it is able to steer the robot towards the object of interest even in presence of significant distraction in the background; (ii) because the features are automatically extracted and matched, our approach can easily and automatically generalize to unseen objects and scenes. In addition, we propose to use a render engine to synthesize the target image, which offers a further level of generalization. We demonstrate these advantages in a robotic grasping task, where the robot is able to steer, with high accuracy, towards the object to grasp, based simply on an image of the object rendered from the camera view corresponding to the desired robot grasping pose.