论文标题

Easter2.0:改进手写文本识别的卷积模型

Easter2.0: Improving convolutional models for handwritten text recognition

论文作者

Chaudhary, Kartik, Bali, Raghav

论文摘要

卷积神经网络(CNN)对手写文本识别(HTR)的任务显示出令人鼓舞的结果,但在性能方面,它们仍然落后于经常性的神经网络(RNN)/基于变形金刚的模型。在本文中,我们提出了一个基于CNN的架构,该体系结构弥合了这一差距。我们的工作Easter2.0由一层卷积,批准归一化,relu,辍学,密集的残留连接,挤压和激发模块组成,并利用连接派时间分类(CTC)损失。除了Easter2.0体系结构外,我们还提出了一种与HTR/OCR任务相关的简单有效的数据增强技术“瓷砖和腐败(TACO)”。当仅使用公开培训数据培训时,我们的工作就可以在IAM手写数据库中获得最新的结果。在我们的实验中,我们还介绍了炸玉米饼增强和挤压和激发(SE)对文本识别准确性的影响。我们进一步表明,Easter2.0适用于几次学习任务,并且在接受有限的带注释数据培训时,包括当前最佳方法,包括变压器。代码和模型可在以下网址提供:https://github.com/kartikgill/easter2

Convolutional Neural Networks (CNN) have shown promising results for the task of Handwritten Text Recognition (HTR) but they still fall behind Recurrent Neural Networks (RNNs)/Transformer based models in terms of performance. In this paper, we propose a CNN based architecture that bridges this gap. Our work, Easter2.0, is composed of multiple layers of 1D Convolution, Batch Normalization, ReLU, Dropout, Dense Residual connection, Squeeze-and-Excitation module and make use of Connectionist Temporal Classification (CTC) loss. In addition to the Easter2.0 architecture, we propose a simple and effective data augmentation technique 'Tiling and Corruption (TACO)' relevant for the task of HTR/OCR. Our work achieves state-of-the-art results on IAM handwriting database when trained using only publicly available training data. In our experiments, we also present the impact of TACO augmentations and Squeeze-and-Excitation (SE) on text recognition accuracy. We further show that Easter2.0 is suitable for few-shot learning tasks and outperforms current best methods including Transformers when trained on limited amount of annotated data. Code and model is available at: https://github.com/kartikgill/Easter2

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源