过滤鸡肉：无监督的对像素空间中反事实物理的学习

论文标题

过滤鸡肉：无监督的对像素空间中反事实物理的学习

Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

论文作者

Janny, Steeven, Baradel, Fabien, Neverova, Natalia, Nadri, Madiha, Mori, Greg, Wolf, Christian

论文摘要

在高维数据（图像，视频）中学习因果关系是一项艰巨的任务，因为它们通常是在低维歧管上定义的，并且必须从数据中以外观，照明，纹理以及数据中的虚假相关性为主导的复杂信号中提取。我们提出了一种学习像素空间中物理过程的反事实推理的方法，这需要预测干预措施对初始条件的影响。除了识别结构关系，我们处理了在远距离预测原始视频的挑战性问题。我们的方法不需要任何地面真相位置或其他对象或场景属性的知识或监督。我们的模型基于密集特征，2D关键点和每个关键点的额外的潜在向量的组合来学习并作用于合适的混合潜在表示。我们表明，这比纯粹的密集或稀疏表示更好地捕获了物理过程的动态。我们引入了一种新的具有挑战性且精心设计的反事实基准，以预测像素空间中的预测，并且在物理启发的ML和视频预测中优于强大的基准。

Learning causal relationships in high-dimensional data (images, videos) is a hard task, as they are often defined on low dimensional manifolds and must be extracted from complex signals dominated by appearance, lighting, textures and also spurious correlations in the data. We present a method for learning counterfactual reasoning of physical processes in pixel space, which requires the prediction of the impact of interventions on initial conditions. Going beyond the identification of structural relationships, we deal with the challenging problem of forecasting raw video over long horizons. Our method does not require the knowledge or supervision of any ground truth positions or other object or scene properties. Our model learns and acts on a suitable hybrid latent representation based on a combination of dense features, sets of 2D keypoints and an additional latent vector per keypoint. We show that this better captures the dynamics of physical processes than purely dense or sparse representations. We introduce a new challenging and carefully designed counterfactual benchmark for predictions in pixel space and outperform strong baselines in physics-inspired ML and video prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题