论文标题
朝着具有时间稀疏编码的自然数据中的非线性解开
Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding
论文作者
论文摘要
我们构建了一个无监督的学习模型,该模型可以实现自然主义视频中差异的基本因素的非线性分离。先前的工作表明,如果环境中的几个因素在任何时间点保持恒定,则可以分解表示形式。结果,为此问题提出的算法仅在具有此精确属性的精心构造的数据集上进行了测试,而尚不清楚它们是否会转移到自然场景中。在这里,我们提供的证据表明,分割的天然电影中的物体经历过渡,这些过渡通常在大小上很小,偶尔跳跃,这是时间稀疏分布的特征。我们利用这一发现并呈现慢景,这是一种无监督的表示学习的模型,该模型在时间相邻的观察结果上使用稀疏的先验来解开生成因素,而无需对变化因素的数量进行任何假设。我们提供了可识别性的证明,并表明该模型可靠地学习了几个已建立的基准数据集中的分离表示,通常超过了当前的最新时间。我们还证明了对具有自然动力学,天然精灵和Kitti面具的视频数据集的可传递性,我们为指导脱离自然数据域的分离研究做出了贡献。
We construct an unsupervised learning model that achieves nonlinear disentanglement of underlying factors of variation in naturalistic videos. Previous work suggests that representations can be disentangled if all but a few factors in the environment stay constant at any point in time. As a result, algorithms proposed for this problem have only been tested on carefully constructed datasets with this exact property, leaving it unclear whether they will transfer to natural scenes. Here we provide evidence that objects in segmented natural movies undergo transitions that are typically small in magnitude with occasional large jumps, which is characteristic of a temporally sparse distribution. We leverage this finding and present SlowVAE, a model for unsupervised representation learning that uses a sparse prior on temporally adjacent observations to disentangle generative factors without any assumptions on the number of changing factors. We provide a proof of identifiability and show that the model reliably learns disentangled representations on several established benchmark datasets, often surpassing the current state-of-the-art. We additionally demonstrate transferability towards video datasets with natural dynamics, Natural Sprites and KITTI Masks, which we contribute as benchmarks for guiding disentanglement research towards more natural data domains.