通过边界变压器进行任意形状的文本检测

论文标题

通过边界变压器进行任意形状的文本检测

Arbitrary Shape Text Detection via Boundary Transformer

论文作者

Zhang, Shi-Xue, Yang, Chun, Zhu, Xiaobin, Yin, Xu-Cheng

论文摘要

在任意形状的文本检测中，定位准确的文本边界具有挑战性且不平淡。现有方法通常会遭受间接的文本边界建模或复杂的后处理。在本文中，我们通过边界学习进行系统地呈现一个统一的粗到框架，以进行任意形状的文本检测，该框架可以准确有效地定位文本边界而无需后处理。在我们的方法中，我们以粗略的方式通过创新的迭代边界变压器明确对文本边界进行建模。这样，我们的方法可以直接获得准确的文本边界并放弃复杂的后处理以提高效率。具体而言，我们的方法主要由特征提取主链，边界提案模块和迭代优化的边界变压器模块组成。由多层扩张卷积组成的边界提案模块将计算重要的先验信息（包括分类图，距离场和方向场），以生成粗边界建议，同时指导边界变压器的优化。边界变压器模块采用编码器折叠结构，其中编码器是由具有残差连接的多层变压器块构造的，而解码器是一个简单的多层perceptron网络（MLP）。在先验信息的指导下，边界变压器模块将通过迭代边界变形逐渐完善粗边界建议。此外，我们提出了一种新型的边界能量损失（BEL），该边界能量损失（BEL）引入了能量最小化的约束和单调减少约束的能量，以进一步优化和稳定边界细化的学习。关于公开可用和挑战数据集的广泛实验证明了我们方法的最先进性能和有前途的效率。

In arbitrary shape text detection, locating accurate text boundaries is challenging and non-trivial. Existing methods often suffer from indirect text boundary modeling or complex post-processing. In this paper, we systematically present a unified coarse-to-fine framework via boundary learning for arbitrary shape text detection, which can accurately and efficiently locate text boundaries without post-processing. In our method, we explicitly model the text boundary via an innovative iterative boundary transformer in a coarse-to-fine manner. In this way, our method can directly gain accurate text boundaries and abandon complex post-processing to improve efficiency. Specifically, our method mainly consists of a feature extraction backbone, a boundary proposal module, and an iteratively optimized boundary transformer module. The boundary proposal module consisting of multi-layer dilated convolutions will compute important prior information (including classification map, distance field, and direction field) for generating coarse boundary proposals while guiding the boundary transformer's optimization. The boundary transformer module adopts an encoder-decoder structure, in which the encoder is constructed by multi-layer transformer blocks with residual connection while the decoder is a simple multi-layer perceptron network (MLP). Under the guidance of prior information, the boundary transformer module will gradually refine the coarse boundary proposals via iterative boundary deformation. Furthermore, we propose a novel boundary energy loss (BEL) which introduces an energy minimization constraint and an energy monotonically decreasing constraint to further optimize and stabilize the learning of boundary refinement. Extensive experiments on publicly available and challenging datasets demonstrate the state-of-the-art performance and promising efficiency of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题