论文标题

发票中的两阶段提取方法

A two-stage approach for table extraction in invoices

论文作者

Saout, Thomas, Lardeux, Frédéric, Saubion, Frédéric

论文摘要

对行政文件的自动分析是文档识别的重要领域,数十年来研究了。发票是公司和公共服务可用的大量文件中的关键文件。发票包含在表中呈现的大多数时间数据,这些时间数据应清楚地识别为提取合适的信息。在本文中,我们提出了一种方法,该方法将基于图像处理的表格形状估计与文档的基于图的表示形式结合在一起,该方法用于精确识别复杂的表。我们提出了使用实际案例应用程序进行实验评估。

The automated analysis of administrative documents is an important field in document recognition that is studied for decades. Invoices are key documents among these huge amounts of documents available in companies and public services. Invoices contain most of the time data that are presented in tables that should be clearly identified to extract suitable information. In this paper, we propose an approach that combines an image processing based estimation of the shape of the tables with a graph-based representation of the document, which is used to identify complex tables precisely. We propose an experimental evaluation using a real case application.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源