论文标题

MSLKANET:一个多尺度的大内核注意网络,用于删除场景文本

MSLKANet: A Multi-Scale Large Kernel Attention Network for Scene Text Removal

论文作者

Lyu, Guangtao

论文摘要

场景文本删除旨在删除文本,并在自然图像中使用知觉上合理的背景信息填充区域。由于其在隐私保护,场景文本检索和文本编辑方面的各种应用,它引起了人们的关注。随着深度学习的发展,先前的方法已取得了重大改进。但是,大多数现有方法似乎忽略了大型的感知领域和全球信息。先锋方法只能将训练数据从裁剪图像更改为完整图像,从而获得重大改进。在本文中,我们提出了一个单阶段的多尺度网络MSLKANET,用于完整图像中的场景文本删除。为了获得大型的感知领域和全球信息,我们提出了多尺度的大内核注意力(MSLKA),以在文本区域和背景之间在各种粒度水平上获得远距离的依赖性。此外,我们结合了大的核分解机制和可操作的空间金字塔池,以构建一个大的内核空间金字塔池(LKSPP),它们可以在保持空间尺寸中更有效的像素,同时保持大型接收场和较低的计算成本。广泛的实验结果表明,所提出的方法可以在合成和现实世界数据集上达到最先进的性能以及所提出的组件MSLKA和LKSPP的有效性。

Scene text removal aims to remove the text and fill the regions with perceptually plausible background information in natural images. It has attracted increasing attention due to its various applications in privacy protection, scene text retrieval, and text editing. With the development of deep learning, the previous methods have achieved significant improvements. However, most of the existing methods seem to ignore the large perceptive fields and global information. The pioneer method can get significant improvements by only changing training data from the cropped image to the full image. In this paper, we present a single-stage multi-scale network MSLKANet for scene text removal in full images. For obtaining large perceptive fields and global information, we propose multi-scale large kernel attention (MSLKA) to obtain long-range dependencies between the text regions and the backgrounds at various granularity levels. Furthermore, we combine the large kernel decomposition mechanism and atrous spatial pyramid pooling to build a large kernel spatial pyramid pooling (LKSPP), which can perceive more valid pixels in the spatial dimension while maintaining large receptive fields and low cost of computation. Extensive experimental results indicate that the proposed method achieves state-of-the-art performance on both synthetic and real-world datasets and the effectiveness of the proposed components MSLKA and LKSPP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源