Easter2.0：改进手写文本识别的卷积模型

论文标题

Easter2.0：改进手写文本识别的卷积模型

Easter2.0: Improving convolutional models for handwritten text recognition

论文作者

Chaudhary, Kartik, Bali, Raghav

论文摘要

卷积神经网络（CNN）对手写文本识别（HTR）的任务显示出令人鼓舞的结果，但在性能方面，它们仍然落后于经常性的神经网络（RNN）/基于变形金刚的模型。在本文中，我们提出了一个基于CNN的架构，该体系结构弥合了这一差距。我们的工作Easter2.0由一层卷积，批准归一化，relu，辍学，密集的残留连接，挤压和激发模块组成，并利用连接派时间分类（CTC）损失。除了Easter2.0体系结构外，我们还提出了一种与HTR/OCR任务相关的简单有效的数据增强技术“瓷砖和腐败（TACO）”。当仅使用公开培训数据培训时，我们的工作就可以在IAM手写数据库中获得最新的结果。在我们的实验中，我们还介绍了炸玉米饼增强和挤压和激发（SE）对文本识别准确性的影响。我们进一步表明，Easter2.0适用于几次学习任务，并且在接受有限的带注释数据培训时，包括当前最佳方法，包括变压器。代码和模型可在以下网址提供：https：//github.com/kartikgill/easter2

Convolutional Neural Networks (CNN) have shown promising results for the task of Handwritten Text Recognition (HTR) but they still fall behind Recurrent Neural Networks (RNNs)/Transformer based models in terms of performance. In this paper, we propose a CNN based architecture that bridges this gap. Our work, Easter2.0, is composed of multiple layers of 1D Convolution, Batch Normalization, ReLU, Dropout, Dense Residual connection, Squeeze-and-Excitation module and make use of Connectionist Temporal Classification (CTC) loss. In addition to the Easter2.0 architecture, we propose a simple and effective data augmentation technique 'Tiling and Corruption (TACO)' relevant for the task of HTR/OCR. Our work achieves state-of-the-art results on IAM handwriting database when trained using only publicly available training data. In our experiments, we also present the impact of TACO augmentations and Squeeze-and-Excitation (SE) on text recognition accuracy. We further show that Easter2.0 is suitable for few-shot learning tasks and outperforms current best methods including Transformers when trained on limited amount of annotated data. Code and model is available at: https://github.com/kartikgill/Easter2

下载PDF全文

下载文献需遵守相关版权规定

论文标题