披萨：一种仅使用图像的零拍零-CAD方法6 DOF跟踪的方法

论文标题

披萨：一种仅使用图像的零拍零-CAD方法6 DOF跟踪的方法

PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking

论文作者

Nguyen, Van Nguyen, Du, Yuming, Xiao, Yang, Ramamonjisoa, Michael, Lepetit, Vincent

论文摘要

在没有先验知识的情况下估算新对象的相对姿势是一个困难的问题，而它是机器人技术和增强现实中非常需要的能力。当训练图像和对象的3D几何形状都不可用时，我们提供了一种在RGB视频序列中跟踪对象的6D运动的方法。因此，与以前的工作相反，我们的方法可以立即考虑开放世界中的未知对象，而无需任何事先信息或特定的培训阶段。我们考虑两个架构，一个基于两个帧，另一个依赖于变压器编码器，可以利用任意数量的过去帧。我们仅使用具有域随机化的合成渲染来训练我们的体系结构。我们在具有挑战性的数据集上的结果与以前需要更多信息的工作（培训目标对象，3D模型和/或深度数据的图像）相当。我们的源代码可从https://github.com/nv-nguyen/pizza获得

Estimating the relative pose of a new object without prior knowledge is a hard problem, while it is an ability very much needed in robotics and Augmented Reality. We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available. In contrast to previous works, our method can therefore consider unknown objects in open world instantly, without requiring any prior information or a specific training phase. We consider two architectures, one based on two frames, and the other relying on a Transformer Encoder, which can exploit an arbitrary number of past frames. We train our architectures using only synthetic renderings with domain randomization. Our results on challenging datasets are on par with previous works that require much more information (training images of the target objects, 3D models, and/or depth data). Our source code is available at https://github.com/nv-nguyen/pizza

下载PDF全文

下载文献需遵守相关版权规定

论文标题