OVE6D：基于深度的6D对象姿势估计的对象观点编码

论文标题

OVE6D：基于深度的6D对象姿势估计的对象观点编码

OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation

论文作者

Cai, Dingding, Heikkilä, Janne, Rahtu, Esa

论文摘要

本文提出了一个称为OVE6D的通用框架，用于基于模型的6D对象构成单个深度图像和目标对象掩码的估计。我们的模型是使用Shapenet呈现的纯合成数据训练的，并且与大多数现有方法不同，它可以很好地概括在新的现实世界中，而无需进行任何微调。我们通过将6D姿势分解为视点，围绕摄像头光轴和翻译的平面旋转，并引入新颖的轻质模块，以估算每个组件以级联的方式来实现这一目标。最终的网络包含少于400万的参数，同时在没有任何数据集特定培训的情况下展示了挑战性的无T型和遮挡的LineMod数据集的出色性能。我们表明，OVE6D的表现优于一些现代深度学习的姿势估计方法，专门针对具有现实世界培训数据的单个对象或数据集培训。实施和预培训模型将公开可用。

This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask. Our model is trained using purely synthetic data rendered from ShapeNet, and, unlike most of the existing methods, it generalizes well on new real-world objects without any fine-tuning. We achieve this by decomposing the 6D pose into viewpoint, in-plane rotation around the camera optical axis and translation, and introducing novel lightweight modules for estimating each component in a cascaded manner. The resulting network contains less than 4M parameters while demonstrating excellent performance on the challenging T-LESS and Occluded LINEMOD datasets without any dataset-specific training. We show that OVE6D outperforms some contemporary deep learning-based pose estimation methods specifically trained for individual objects or datasets with real-world training data. The implementation and the pre-trained model will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题