论文标题
DeepRM:6D姿势完善的深度匹配
DeepRM: Deep Recurrent Matching for 6D Pose Refinement
论文作者
论文摘要
从RGB图像中对刚性对象进行精确的6D构成估计是机器人技术,增强现实和人类计算机相互作用的至关重要但挑战性的任务。为了解决这个问题,我们提出了DeepRM,这是一种新型的经过改进的经常性网络体系结构。 DeepRM利用初始粗姿势估计来渲染目标对象的合成图像。然后将渲染图像与观察到的图像匹配,以预测更新先前姿势估计值的刚性变换。重复此过程以逐步完善每次迭代的估计值。 DeepRM架构结合了LSTM单元,以通过每个完善步骤传播信息,从而大大提高整体性能。与当前的2阶段透视透点的解决方案相反,DEEPRM是端到端训练的,并且使用可扩展的主链,可以通过单个参数调整以提高准确性和效率。在训练过程中,添加了多尺度的光流头,以预测观察到的和合成图像之间的光流。光流预测稳定了训练过程,并实施了与姿势估计任务相关的特征的学习。我们的结果表明,DEEPRM在两个广泛接受的具有挑战性的数据集上实现了最先进的性能。
Precise 6D pose estimation of rigid objects from RGB images is a critical but challenging task in robotics, augmented reality and human-computer interaction. To address this problem, we propose DeepRM, a novel recurrent network architecture for 6D pose refinement. DeepRM leverages initial coarse pose estimates to render synthetic images of target objects. The rendered images are then matched with the observed images to predict a rigid transform for updating the previous pose estimate. This process is repeated to incrementally refine the estimate at each iteration. The DeepRM architecture incorporates LSTM units to propagate information through each refinement step, significantly improving overall performance. In contrast to current 2-stage Perspective-n-Point based solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can be tuned via a single parameter for accuracy and efficiency. During training, a multi-scale optical flow head is added to predict the optical flow between the observed and synthetic images. Optical flow prediction stabilizes the training process, and enforces the learning of features that are relevant to the task of pose estimation. Our results demonstrate that DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.