论文标题
OVE6D:基于深度的6D对象姿势估计的对象观点编码
OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation
论文作者
论文摘要
本文提出了一个称为OVE6D的通用框架,用于基于模型的6D对象构成单个深度图像和目标对象掩码的估计。我们的模型是使用Shapenet呈现的纯合成数据训练的,并且与大多数现有方法不同,它可以很好地概括在新的现实世界中,而无需进行任何微调。我们通过将6D姿势分解为视点,围绕摄像头光轴和翻译的平面旋转,并引入新颖的轻质模块,以估算每个组件以级联的方式来实现这一目标。最终的网络包含少于400万的参数,同时在没有任何数据集特定培训的情况下展示了挑战性的无T型和遮挡的LineMod数据集的出色性能。我们表明,OVE6D的表现优于一些现代深度学习的姿势估计方法,专门针对具有现实世界培训数据的单个对象或数据集培训。 实施和预培训模型将公开可用。
This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask. Our model is trained using purely synthetic data rendered from ShapeNet, and, unlike most of the existing methods, it generalizes well on new real-world objects without any fine-tuning. We achieve this by decomposing the 6D pose into viewpoint, in-plane rotation around the camera optical axis and translation, and introducing novel lightweight modules for estimating each component in a cascaded manner. The resulting network contains less than 4M parameters while demonstrating excellent performance on the challenging T-LESS and Occluded LINEMOD datasets without any dataset-specific training. We show that OVE6D outperforms some contemporary deep learning-based pose estimation methods specifically trained for individual objects or datasets with real-world training data. The implementation and the pre-trained model will be made publicly available.