探索单眼姿势估计的中间表示

论文标题

探索单眼姿势估计的中间表示

Exploring intermediate representation for monocular vehicle pose estimation

论文作者

Li, Shichao, Yan, Zengqiang, Li, Hongyang, Cheng, Kwang-Ting

论文摘要

我们提出了一个新的基于学习的框架，以从单个RGB图像中恢复So（3）的车辆姿势。与以前从局部外观到观察角的绘制的作品相反，我们通过提取有意义的中间几何表示（IGRS）来探讨一种渐进式方法，以估计以自我为中心的车辆方向。这种方法具有深层模型，该模型将感知的强度转化为IGR，该模型映射到了摄像机坐标系中编码对象方向的3D表示。核心问题是IGR的使用以及如何更有效地学习它们。我们通过基于插值的立方设计IGRS来回答以前的问题，该插值很容易源自原始的3D注释。后一个问题促使我们将几何知识与基于投影不变的新损失函数结合在一起。此损失功能允许在训练阶段使用未标记的数据来改善表示表示学习。没有其他标签，我们的系统的表现优于先前的基于单眼RGB的方法，用于对KITTI基准测试的联合车辆检测和构成估计，从而达到了与立体方法相当的性能。代码和预训练的模型可在此HTTPS URL上找到。

We present a new learning-based framework to recover vehicle pose in SO(3) from a single RGB image. In contrast to previous works that map from local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs) to estimate egocentric vehicle orientation. This approach features a deep model that transforms perceived intensities to IGRs, which are mapped to a 3D representation encoding object orientation in the camera coordinate system. Core problems are what IGRs to use and how to learn them more effectively. We answer the former question by designing IGRs based on an interpolated cuboid that derives from primitive 3D annotation readily. The latter question motivates us to incorporate geometry knowledge with a new loss function based on a projective invariant. This loss function allows unlabeled data to be used in the training stage to improve representation learning. Without additional labels, our system outperforms previous monocular RGB-based methods for joint vehicle detection and pose estimation on the KITTI benchmark, achieving performance even comparable to stereo methods. Code and pre-trained models are available at this https URL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题