论文标题
双形式:混合自我发项变压器,用于有效图像恢复
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
论文作者
论文摘要
最近,与以前的最新CNN相比,图像恢复变压器的性能可比性。但是,如何有效利用此类体系结构仍然是一个开放的问题。在这项工作中,我们介绍了双层形式,其批判性见解是结合自我发场模块的强大全球建模能力和整体体系结构中卷积的本地建模能力。通过在编码器和解码器中配备的基于卷积的本地特征提取模块,我们仅在潜在层中采用新型混合变压器块,以模拟空间维度中的长距离依赖性并处理通道之间的不均匀分布。这样的设计消除了先前图像恢复变压器中的实质计算复杂性,并在多个图像恢复任务上实现了卓越的性能。实验表明,双形式可在室内数据集上的最新最大值方法获得1.91db的增益,以用于单个图像去悬式,同时仅消耗4.2%的GFLOPS作为Maxim。对于单个图像驱动,它超过了五个数据集的平均结果,只有21.5%的Gflops的平均结果,超过了SOTA方法。双方形式还实质上超过了各种数据集上的最新否定方法,参数较少。
Recently, image restoration transformers have achieved comparable performance with previous state-of-the-art CNNs. However, how to efficiently leverage such architectures remains an open problem. In this work, we present Dual-former whose critical insight is to combine the powerful global modeling ability of self-attention modules and the local modeling ability of convolutions in an overall architecture. With convolution-based Local Feature Extraction modules equipped in the encoder and the decoder, we only adopt a novel Hybrid Transformer Block in the latent layer to model the long-distance dependence in spatial dimensions and handle the uneven distribution between channels. Such a design eliminates the substantial computational complexity in previous image restoration transformers and achieves superior performance on multiple image restoration tasks. Experiments demonstrate that Dual-former achieves a 1.91dB gain over the state-of-the-art MAXIM method on the Indoor dataset for single image dehazing while consuming only 4.2% GFLOPs as MAXIM. For single image deraining, it exceeds the SOTA method by 0.1dB PSNR on the average results of five datasets with only 21.5% GFLOPs. Dual-former also substantially surpasses the latest desnowing method on various datasets, with fewer parameters.