论文标题

汉字中风提取,数据集和基准测试的实例细分

Instance Segmentation for Chinese Character Stroke Extraction, Datasets and Benchmarks

论文作者

Liu, Lizhao, Lin, Kunyang, Huang, Shangxin, Li, Zhongli, Li, Chao, Cao, Yunbo, Zhou, Qingyu

论文摘要

中风是汉字的基本要素,中风提取一直是一项重要且长期以来的努力。由于培训数据有限,现有的中风提取方法通常是手工制作的,并且高度依赖于域专业知识。此外,没有标准化的基准可以提供不同的中风提取方法之间的公平比较,我们认为这是汉语中卒中理解和相关任务的发展的主要障碍。在这项工作中,我们介绍了第一个公众可用的中文中风提取(CCSE)基准,并提供了两个新的大型数据集:Kaiti CCSE(CCSE-KAI)和手写CCSE(CCSE-HW)。借助大规模数据集,我们希望利用CNN等深层模型的表示能力来解决中风提取任务,但是,这仍然是一个悬而未决的问题。为此,我们将中风提取问题变成了中风实例分割问题。使用拟议的数据集训练中风实例分割模型,我们以很大的边距超越了先前的方法。此外,经过提议的数据集培训的模型使下游字体生成和手写美学评估任务受益。我们希望这些基准结果可以促进进一步的研究。源代码和数据集可公开可用:https://github.com/lizhaoliu-lec/ccse。

Stroke is the basic element of Chinese character and stroke extraction has been an important and long-standing endeavor. Existing stroke extraction methods are often handcrafted and highly depend on domain expertise due to the limited training data. Moreover, there are no standardized benchmarks to provide a fair comparison between different stroke extraction methods, which, we believe, is a major impediment to the development of Chinese character stroke understanding and related tasks. In this work, we present the first public available Chinese Character Stroke Extraction (CCSE) benchmark, with two new large-scale datasets: Kaiti CCSE (CCSE-Kai) and Handwritten CCSE (CCSE-HW). With the large-scale datasets, we hope to leverage the representation power of deep models such as CNNs to solve the stroke extraction task, which, however, remains an open question. To this end, we turn the stroke extraction problem into a stroke instance segmentation problem. Using the proposed datasets to train a stroke instance segmentation model, we surpass previous methods by a large margin. Moreover, the models trained with the proposed datasets benefit the downstream font generation and handwritten aesthetic assessment tasks. We hope these benchmark results can facilitate further research. The source code and datasets are publicly available at: https://github.com/lizhaoliu-Lec/CCSE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源