论文标题
神经分布式图像压缩具有跨注意功能对齐
Neural Distributed Image Compression with Cross-Attention Feature Alignment
论文作者
论文摘要
当仅在解码器方面提供相关的信息时,我们会考虑将信息源压缩的问题,这是信息理论中分布式源编码问题的一种特殊情况。特别是,我们考虑了一对具有重叠字段的立体图像,并通过同步和校准的摄像机作为相关图像源捕获。在先前提出的方法中,编码器使用深神经网络将输入图像转换为潜在表示,并使用熵编码无损地压缩了量化的潜在表示。解码器解码熵编码的量化潜在表示,并使用此表示形式和可用侧面信息重建输入图像。在提出的方法中,解码器采用了一个交叉意见模块来对齐从输入图像的收到的潜在表示获得的特征图和侧面信息的潜在表示。我们认为,将特征图中的相关贴片对齐可以更好地利用侧面信息。我们从经验上证明了拟议的算法在立体声图像对的Kitti和CityScape数据集上的竞争力。我们的实验结果表明,与以前的工作相比,所提出的架构能够以更有效的方式利用仅解码器的侧面信息。
We consider the problem of compressing an information source when a correlated one is available as side information only at the decoder side, which is a special case of the distributed source coding problem in information theory. In particular, we consider a pair of stereo images, which have overlapping fields of view, and are captured by a synchronized and calibrated pair of cameras as correlated image sources. In previously proposed methods, the encoder transforms the input image to a latent representation using a deep neural network, and compresses the quantized latent representation losslessly using entropy coding. The decoder decodes the entropy-coded quantized latent representation, and reconstructs the input image using this representation and the available side information. In the proposed method, the decoder employs a cross-attention module to align the feature maps obtained from the received latent representation of the input image and a latent representation of the side information. We argue that aligning the correlated patches in the feature maps allows better utilization of the side information. We empirically demonstrate the competitiveness of the proposed algorithm on KITTI and Cityscape datasets of stereo image pairs. Our experimental results show that the proposed architecture is able to exploit the decoder-only side information in a more efficient manner compared to previous works.