论文标题

文档图像纠正的几何表示学习

Geometric Representation Learning for Document Image Rectification

论文作者

Feng, Hao, Zhou, Wengang, Deng, Jiajun, Wang, Yuechen, Li, Houqiang

论文摘要

在文档图像纠正中,扭曲的图像与地面真相之间存在丰富的几何约束。但是,在现有的高级解决方案中,这种几何约束在很大程度上被忽略了,从而限制了整流性能。为此,我们通过引入明确的几何表示来介绍文档图像纠正的Docgeonet。从技术上讲,文档图像的两个典型属性涉及所提出的几何表示学习,即3D形状和文本线。我们的动机源于3D形状为纠正扭曲的文档图像的全局刻薄提示提供的见解,同时忽略了本地结构。另一方面,文本线补充地为本地模式提供明确的几何约束。学识渊博的几何表示有效地桥接了扭曲的图像和地面真相。广泛的实验表明了我们的框架的有效性,并证明了我们在Docunet基准数据集和我们提出的DIR300测试集上的Docgeonet优于最先进的方法。该代码可在https://github.com/fh2019ustc/docgeonet上找到。

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification performance. To this end, we present DocGeoNet for document image rectification by introducing explicit geometric representation. Technically, two typical attributes of the document image are involved in the proposed geometric representation learning, i.e., 3D shape and textlines. Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image while overlooking the local structure. On the other hand, textlines complementarily provide explicit geometric constraints for local patterns. The learned geometric representation effectively bridges the distorted image and the ground truth one. Extensive experiments show the effectiveness of our framework and demonstrate the superiority of our DocGeoNet over state-of-the-art methods on both the DocUNet Benchmark dataset and our proposed DIR300 test set. The code is available at https://github.com/fh2019ustc/DocGeoNet.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源