论文标题

减少在神经视频压缩中的边际分布与学习分布之间的不匹配

Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression

论文作者

Balcilar, Muhammet, Damodaran, Bharath Bhushan, Hellier, Pierre

论文摘要

在过去的四年中,我们目睹了端到端可训练的模型以进行图像压缩。与数十年的增量工作相比,这些机器学习(ML)技术学习了压缩技术的所有组成部分,这解释了它们的实际优势。但是,端到端的ML模型尚未达到传统视频编解码器(例如VVC)的性能。可以提出可能的解释:缺乏数据来解释神经模型中潜在密度估计的时间冗余或效率低下。后一个问题可以通过潜在的边缘分布与学到的先前分布之间的差异来定义。这种不匹配,称为熵模型的摊销差距,扩大了压缩数据的文件大小。在本文中,我们建议评估三种最先进的ML视频压缩方法的摊销差距。其次,我们提出了一种有效且通用的方法来解决摊销差距,并表明它导致$ 2 \%$介于$ 5 \%$的情况下,而不会影响重建质量。

During the last four years, we have witnessed the success of end-to-end trainable models for image compression. Compared to decades of incremental work, these machine learning (ML) techniques learn all the components of the compression technique, which explains their actual superiority. However, end-to-end ML models have not yet reached the performance of traditional video codecs such as VVC. Possible explanations can be put forward: lack of data to account for the temporal redundancy, or inefficiency of latent's density estimation in the neural model. The latter problem can be defined by the discrepancy between the latent's marginal distribution and the learned prior distribution. This mismatch, known as amortization gap of entropy model, enlarges the file size of compressed data. In this paper, we propose to evaluate the amortization gap for three state-of-the-art ML video compression methods. Second, we propose an efficient and generic method to solve the amortization gap and show that it leads to an improvement between $2\%$ to $5\%$ without impacting reconstruction quality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源