解开3D原型网络，用于几个概念学习

论文标题

解开3D原型网络，用于几个概念学习

Disentangling 3D Prototypical Networks For Few-Shot Concept Learning

论文作者

Prabhudesai, Mihir, Lal, Shamit, Patil, Darshan, Tung, Hsiao-Yu, Harley, Adam W, Fragkiadaki, Katerina

论文摘要

我们提出了神经体系结构，将RGB-D图像删除到对象的形状和样式以及背景场景的地图中，并探索其应用程序以示为少量3D对象检测和少量概念分类。我们的网络结合了反映图像形成过程，世界场景的3D几何形状以及形状式相互作用的建筑偏见。通过预测静态场景中的视图以及少量的3D对象盒，它们是通过预测静态视图来训练的。对象和场景以网络瓶颈中的3D特征网格表示。我们表明所提出的3D神经表示形式是组成：它们可以通过混合对象形状和样式，调整和添加所得的对象3D特征图上的背景场景图片映射来生成新颖的3D场景特征图。我们表明，对对象类别，颜色，材料和空间关系进行分类器，经过分离的3D特征子空间训练的分类器比当前的最新面积更少，示例更少，并启用视觉询问系统，并启用了一个视觉响应系统，该系统将它们用作其模块，以将其用于现场中的小说对象将其推广到现场。

We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification. Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay. They are trained end-to-end self-supervised by predicting views in static scenes, alongside a small number of 3D object boxes. Objects and scenes are represented in terms of 3D feature grids in the bottleneck of the network. We show that the proposed 3D neural representations are compositional: they can generate novel 3D scene feature maps by mixing object shapes and styles, resizing and adding the resulting object 3D feature maps over background scene feature maps. We show that classifiers for object categories, color, materials, and spatial relationships trained over the disentangled 3D feature sub-spaces generalize better with dramatically fewer examples than the current state-of-the-art, and enable a visual question answering system that uses them as its modules to generalize one-shot to novel objects in the scene.

下载PDF全文

下载文献需遵守相关版权规定

论文标题