类别级别的姿势检索，并具有对比的特征，并通过遮挡增强

论文标题

类别级别的姿势检索，并具有对比的特征，并通过遮挡增强

Category-Level Pose Retrieval with Contrastive Features Learnt with Occlusion Augmentation

论文作者

Kouros, Georgios, Shrivastava, Shubham, Picron, Cédric, Nagesh, Sushruth, Chakravarty, Punarjay, Tuytelaars, Tinne

论文摘要

姿势估计通常被解决为垃圾箱分类或回归问题。在这两种情况下，这个想法都是直接预测对象的姿势。由于相似的姿势与相似性姿势之间的相似性之间的外观变化，这是一项非平凡的任务。取而代之的是，我们遵循的关键思想是，比较两个姿势比直接预测一个姿势要容易。到此为止，已经采用了渲染和能力的方法，但是，它们往往是不稳定的，计算上的昂贵且用于实时应用程序缓慢。我们建议使用动态边缘和连续的姿势标签空间在嵌入空间中学习对齐空间中的对齐度度量来进行类别级别的姿势估计。为了高效的推断，我们使用一个简单的实时图像检索方案，并使用预渲染和预先安装的渲染集集。为了实现对现实情况的鲁棒性，我们采用合成遮挡，边界盒扰动和外观增强。我们的方法在Pascal3D和OckludedPascal3D上实现了最先进的性能，并在跨数据库评估环境中超过了Kitti3D的竞争方法。该代码当前可在https://github.com/gkouros/contrastive-pose-retrieval上找到。

Pose estimation is usually tackled as either a bin classification or a regression problem. In both cases, the idea is to directly predict the pose of an object. This is a non-trivial task due to appearance variations between similar poses and similarities between dissimilar poses. Instead, we follow the key idea that comparing two poses is easier than directly predicting one. Render-and-compare approaches have been employed to that end, however, they tend to be unstable, computationally expensive, and slow for real-time applications. We propose doing category-level pose estimation by learning an alignment metric in an embedding space using a contrastive loss with a dynamic margin and a continuous pose-label space. For efficient inference, we use a simple real-time image retrieval scheme with a pre-rendered and pre-embedded reference set of renderings. To achieve robustness to real-world conditions, we employ synthetic occlusions, bounding box perturbations, and appearance augmentations. Our approach achieves state-of-the-art performance on PASCAL3D and OccludedPASCAL3D and surpasses the competing methods on KITTI3D in a cross-dataset evaluation setting. The code is currently available at https://github.com/gkouros/contrastive-pose-retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题