无监督的图像表示学习深层粒子

论文标题

无监督的图像表示学习深层粒子

Unsupervised Image Representation Learning with Deep Latent Particles

论文作者

Daniel, Tal, Tamar, Aviv

论文摘要

我们提出了一个新的视觉数据表示，该数据将对象位置从外观中删除。我们的方法称为深潜粒子（DLP），将视觉输入分解为低维的潜在``粒子''，其中每个粒子都用其周围区域的空间位置和特征来描述。为了推动学习此类表示形式的学习，我们遵循一种基于VAE的方法，并根据空间效果结构引入了粒子位置的先验位置，并修改了受粒子之间倒角距离启发的证据下限损失。我们证明，我们的DLP表示形式可用于下游任务，例如无监督关键点（KP）检测，图像操作以及针对由多个动态对象组成的场景的视频预测。此外，我们表明，我们对问题的概率解释自然提供了粒子位置的不确定性估计，可用于模型选择以及其他任务。可以使用视频和代码：https：//taldatech.github.io/deep-latent-partictes-web/

We propose a new representation of visual data that disentangles object position from appearance. Our method, termed Deep Latent Particles (DLP), decomposes the visual input into low-dimensional latent ``particles'', where each particle is described by its spatial location and features of its surrounding region. To drive learning of such representations, we follow a VAE-based approach and introduce a prior for particle positions based on a spatial-softmax architecture, and a modification of the evidence lower bound loss inspired by the Chamfer distance between particles. We demonstrate that our DLP representations are useful for downstream tasks such as unsupervised keypoint (KP) detection, image manipulation, and video prediction for scenes composed of multiple dynamic objects. In addition, we show that our probabilistic interpretation of the problem naturally provides uncertainty estimates for particle locations, which can be used for model selection, among other tasks. Videos and code are available: https://taldatech.github.io/deep-latent-particles-web/

下载PDF全文

下载文献需遵守相关版权规定

论文标题