论文标题

使用多任务变压器朝弱监督的文本斑点

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

论文作者

Kittenplon, Yair, Lavi, Inbal, Fogel, Sharon, Bar, Yarin, Manmatha, R., Perona, Pietro

论文摘要

由于共同优化文本检测和识别组件的好处,文本端到端方法最近在文献中引起了人们的关注。现有方法通常在检测和识别分支之间具有明显的分离,需要对这两个任务进行精确注释。我们介绍了TextTranspotter(TTS),这是一种基于变压器的文本斑点方法,也是第一个文本斑点框架,可以通过完全和弱监督的设置进行培训。通过每个单词检测学习一个潜在表示,并根据匈牙利损失使用新的损失功能,我们的方法减轻了对昂贵的本地化注释的需求。我们的弱监督方法仅接受文本转录注释培训,通过以前的最先进的完全监督的方法实现了竞争性能。当以完全监督的方式接受培训时,TextTransPotter在多个基准测试中显示了最先进的结果。

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源