VDTR：带有变压器的视频DEBLURING

论文标题

VDTR：带有变压器的视频DEBLURING

VDTR: Video Deblurring with Transformer

论文作者

Cao, Mingdeng, Fan, Yanbo, Zhang, Yong, Wang, Jue, Yang, Yujiu

论文摘要

由于具有挑战性的时空建模过程，视频脱毛仍然是一个尚未解决的问题。虽然现有的卷积神经网络方法显示出有效的视频脱毛的空间和时间建模的能力有限。本文介绍了VDTR，这是一种有效的基于变压器的模型，它是首次尝试适应视频DeBlurring的变压器。 VDTR利用了变压器在空间和时间建模中的出色远程和关系建模功能。但是，由于复杂的非均匀模糊，多个帧的未对准以及高分辨率空间建模的高计算成本，设计一个基于变压器的模型来设计基于变压器的模型是一项挑战。为了解决这些问题，VDTR倡导者在非重叠的窗口中执行注意力并利用层次结构进行长期依赖性建模。对于框架级空间建模，我们提出了一个编码器码头变压器，该变压器利用多尺度功能用于脱毛。对于多帧的时间建模，我们适应了变形金刚有效地融合多个空间特征。与基于CNN的方法相比，所提出的方法在合成和现实世界视频DEBLURING基准（包括DVD，GOPRO，REDS和BSD）上取得了高度竞争的结果。我们希望这样的基于变压器的体系结构可以作为视频Deblurring和其他视频恢复任务的强大替代基线。源代码将在\ url {https://github.com/ljzycmd/vdtr}上获得。

Video deblurring is still an unsolved problem due to the challenging spatio-temporal modeling process. While existing convolutional neural network-based methods show a limited capacity for effective spatial and temporal modeling for video deblurring. This paper presents VDTR, an effective Transformer-based model that makes the first attempt to adapt Transformer for video deblurring. VDTR exploits the superior long-range and relation modeling capabilities of Transformer for both spatial and temporal modeling. However, it is challenging to design an appropriate Transformer-based model for video deblurring due to the complicated non-uniform blurs, misalignment across multiple frames and the high computational costs for high-resolution spatial modeling. To address these problems, VDTR advocates performing attention within non-overlapping windows and exploiting the hierarchical structure for long-range dependencies modeling. For frame-level spatial modeling, we propose an encoder-decoder Transformer that utilizes multi-scale features for deblurring. For multi-frame temporal modeling, we adapt Transformer to fuse multiple spatial features efficiently. Compared with CNN-based methods, the proposed method achieves highly competitive results on both synthetic and real-world video deblurring benchmarks, including DVD, GOPRO, REDS and BSD. We hope such a Transformer-based architecture can serve as a powerful alternative baseline for video deblurring and other video restoration tasks. The source code will be available at \url{https://github.com/ljzycmd/VDTR}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题