多视图对象姿势改进，可区分渲染器

论文标题

多视图对象姿势改进，可区分渲染器

Multi-View Object Pose Refinement With Differentiable Renderer

论文作者

Shugurov, Ivan, Pavlov, Ivan, Zakharov, Sergey, Ilic, Slobodan

论文摘要

本文介绍了一种新型的多视图6 DOF对象姿势细化方法，重点是改进培训合成数据的方法。它基于DPOD检测器，该检测器会在每个帧中的模型顶点和图像像素之间产生密集的2D-3D对应关系。我们选择使用具有已知相对摄像机转换的多个帧，因为它允许通过可解释的ICP样损耗函数引入几何约束。损耗函数是通过可区分的渲染器实现的，并经过迭代进行了优化。我们还证明，仅根据合成数据训练的完整检测和完善管道可用于自动标记的真实数据。我们对linemod，caslusion，自制和YCB-V数据集执行定量评估，并与对合成和真实数据训练的最新方法相比，报告出色的性能。我们从经验上证明，我们的方法仅需要几个帧，并且在外部相机校准中关闭相机位置和噪音是可靠的，从而使其实际用法更加容易且无处不在。

This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data. It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame. We have opted for the use of multiple frames with known relative camera transformations, as it allows introduction of geometrical constraints via an interpretable ICP-like loss function. The loss function is implemented with a differentiable renderer and is optimized iteratively. We also demonstrate that a full detection and refinement pipeline, which is trained solely on synthetic data, can be used for auto-labeling real data. We perform quantitative evaluation on LineMOD, Occlusion, Homebrewed and YCB-V datasets and report excellent performance in comparison to the state-of-the-art methods trained on the synthetic and real data. We demonstrate empirically that our approach requires only a few frames and is robust to close camera locations and noise in extrinsic camera calibration, making its practical usage easier and more ubiquitous.

下载PDF全文

下载文献需遵守相关版权规定

论文标题