光学特征识别中数据增强的3D渲染框架

论文标题

光学特征识别中数据增强的3D渲染框架

3D Rendering Framework for Data Augmentation in Optical Character Recognition

论文作者

Spruck, Andreas, Hawesch, Maximiliane, Maier, Anatol, Riess, Christian, Seiler, Jürgen, Kaup, André

论文摘要

在本文中，我们为光学特征识别（OCR）提出了一个数据增强框架。所提出的框架能够综合新的视角和照明方案，从而有效地丰富任何可用的OCR数据集。它的模块化结构允许修改以符合单个用户需求。该框架使得可以舒适地扩展可用数据集的扩大因子。此外，提出的方法不仅限于单帧OCR，但也可以应用于视频OCR。我们通过扩大普通BRNO移动OCR数据集的15％子集来证明框架的性能。我们提出的框架能够利用OCR应用程序的性能，尤其是针对小型数据集。应用提出的方法，在字符错误率（CER）方面最多提高了2.79个百分点，并在子集中实现了词错误率（WER）的7.88个百分点。特别是可以改善对具有挑战性的文本线条的认识。该类别的CER可能会降低14.92个百分点，而该级别的CER最高可下降18.19个百分点。此外，与原始的未夸大完整数据集相比，使用提出方法的15％子集进行训练时，我们能够达到较小的错误率。

In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points in terms of Word Error Rate (WER) are achieved on the subset. Especially the recognition of challenging text lines can be improved. The CER may be decreased by up to 14.92 percentage points and the WER by up to 18.19 percentage points for this class. Moreover, we are able to achieve smaller error rates when training on the 15% subset augmented with the proposed method than on the original non-augmented full dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题