CRT-6D：快速6D对象姿势估计带有级联的细化变压器

论文标题

CRT-6D：快速6D对象姿势估计带有级联的细化变压器

CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers

论文作者

Castro, Pedro, Kim, Tae-Kyun

论文摘要

基于学习的6D对象姿势估计方法依赖于计算大型中间姿势表示和/或迭代地完善初始估计，并以缓慢的渲染能力管道来进行。本文介绍了一种新型方法，我们称为级联姿势改进变压器或CRT-6D。我们用从特征金字塔采样的稀疏特征替换了常用的密集中间表示，我们称之为OSKFS（对象表面关键点特征），其中每个元素对应于对象键盘。我们采用轻巧的变形变压器，并将它们链接在一起，以迭代精炼的拟议姿势在采样的OSKF上。我们的推理运行时间比最接近的艺术方法的实时状态快2倍，同时支持单个模型上多达21个对象。我们通过在LM-O和YCBV数据集上进行大量实验来证明CRT-6D的有效性。与实时方法相比，我们在LM-O和YCB-V上实现了最新技术，略高于推理运行时间的方法，略高于一个数量级。源代码可在以下网址找到：https：//github.com/pedrocastro/crt-6d

Learning based 6D object pose estimation methods rely on computing large intermediate pose representations and/or iteratively refining an initial estimation with a slow render-compare pipeline. This paper introduces a novel method we call Cascaded Pose Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the feature pyramid we call OSKFs(Object Surface Keypoint Features) where each element corresponds to an object keypoint. We employ lightweight deformable transformers and chain them together to iteratively refine proposed poses over the sampled OSKFs. We achieve inference runtimes 2x faster than the closest real-time state of the art methods while supporting up to 21 objects on a single model. We demonstrate the effectiveness of CRT-6D by performing extensive experiments on the LM-O and YCBV datasets. Compared to real-time methods, we achieve state of the art on LM-O and YCB-V, falling slightly behind methods with inference runtimes one order of magnitude higher. The source code is available at: https://github.com/PedroCastro/CRT-6D

下载PDF全文

下载文献需遵守相关版权规定

论文标题