视频生成的扩散概率建模

论文标题

视频生成的扩散概率建模

Diffusion Probabilistic Modeling for Video Generation

论文作者

Yang, Ruihan, Srivastava, Prakhar, Mandt, Stephan

论文摘要

去核扩散概率模型是一种有希望的新的生成模型，标志着高质量图像生成中的里程碑。本文展示了它们依次生成视频的能力，超过了感知和概率预测指标中的先前方法。我们提出了一个受神经视频压缩最新进展的启发的自动回归，端到端优化的视频扩散模型。该模型通过使用反向扩散过程产生的随机残差来纠正确定性的下一框架预测来依次生成未来的帧。我们将这种方法与涉及基于天然和基于仿真的视频的四个数据集上的五个基线进行了比较。我们发现所有数据集的感知质量方面都有显着改善。此外，通过引入适用于视频的连续排名概率得分（CRP）的可扩展版本，我们表明我们的模型在其概率框架预测能力中还胜过现有方法。

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural video compression. The model successively generates future frames by correcting a deterministic next-frame prediction using a stochastic residual generated by an inverse diffusion process. We compare this approach against five baselines on four datasets involving natural and simulation-based videos. We find significant improvements in terms of perceptual quality for all datasets. Furthermore, by introducing a scalable version of the Continuous Ranked Probability Score (CRPS) applicable to video, we show that our model also outperforms existing approaches in their probabilistic frame forecasting ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题