递归神经程序：图像语法和部分整体层次结构的变分学习

论文标题

递归神经程序：图像语法和部分整体层次结构的变分学习

Recursive Neural Programs: Variational Learning of Image Grammars and Part-Whole Hierarchies

论文作者

Fisher, Ares, Rao, Rajesh P. N.

论文摘要

人类的视野涉及使用基于部分整体层次结构的结构化表示形式解析和表示对象和场景。计算机视觉和机器学习研究人员最近试图使用胶囊网络，参考框架和主动预测编码来模仿此功能，但是缺乏生成模型的配方。我们介绍递归神经程序（RNP），据我们所知，这是解决部分整体层次学习问题的第一个神经生成模型。 RNPS模型图像作为概率感官运动程序的分层树，递归重复使用学习的感觉运动原语，以在不同的参考框架内建模图像，从而形成递归图像语法。我们将RNP表示为用于推理和采样的结构化变异自动编码器（SVAE），并展示了MNIST，Omniglot和时尚敏捷数据集的基于零件的解析，采样和单次传输学习，以展示模型的表现力。我们的结果表明，RNP提供了组合对象和场景的直观和可解释的方式，从而可以根据部分整体层次结构对对象的丰富组成性和直观的解释。

Human vision involves parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using capsule networks, reference frames and active predictive coding, but a generative model formulation has been lacking. We introduce Recursive Neural Programs (RNPs), which, to our knowledge, is the first neural generative model to address the part-whole hierarchy learning problem. RNPs model images as hierarchical trees of probabilistic sensory-motor programs that recursively reuse learned sensory-motor primitives to model an image within different reference frames, forming recursive image grammars. We express RNPs as structured variational autoencoders (sVAEs) for inference and sampling, and demonstrate parts-based parsing, sampling and one-shot transfer learning for MNIST, Omniglot and Fashion-MNIST datasets, demonstrating the model's expressive power. Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes, allowing rich compositionality and intuitive interpretations of objects in terms of part-whole hierarchies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题