论文标题
引入具有作者特征的新的高分辨率手写数字数据集
Introducing a new high-resolution handwritten digits data set with writer characteristics
论文作者
论文摘要
本文的贡献是两个方面。首先,我们介绍了我们收集的新手写数字数据集。它包含了手写的高分辨率图像,本文中的贡献是两倍。首先,我们介绍了我们收集的新手写数字数据集。它包含手写数字的高分辨率图像以及各种作者特征,这些特征在著名的MNIST数据库中不可用。收集的多个作者特征是我们数据集的新颖性,并创造了新的研究机会。数据集可在线公开可用。其次,我们分析了此新数据集。我们从简单的监督任务开始。我们评估了收集的作者特征的可预测性,将其中一些特征用作分类任务的预测因素以及更高分辨率图像对分类准确性的影响。我们还探索半监督的应用程序;我们可以利用已经在线存在的大量手写数字数据集以明显的成功提高各种分类任务的准确性。最后,我们还展示了此新数据集提供的生成观点。我们能够生成模仿特定作家的写作风格的图像。数据集具有独特而独特的功能,我们的分析建立了基准,并展示了通过此新数据集使一些新的机会成为可能的新机会。
The contributions in this article are two-fold. First, we introduce a new hand-written digit data set that we collected. It contains high-resolution images of hand-written The contributions in this article are two-fold. First, we introduce a new handwritten digit data set that we collected. It contains high-resolution images of handwritten digits together with various writer characteristics which are not available in the well-known MNIST database. The multiple writer characteristics gathered are a novelty of our data set and create new research opportunities. The data set is publicly available online. Second, we analyse this new data set. We begin with simple supervised tasks. We assess the predictability of the writer characteristics gathered, the effect of using some of those characteristics as predictors in classification task and the effect of higher resolution images on classification accuracy. We also explore semi-supervised applications; we can leverage the high quantity of handwritten digits data sets already existing online to improve the accuracy of various classifications task with noticeable success. Finally, we also demonstrate the generative perspective offered by this new data set; we are able to generate images that mimics the writing style of specific writers. The data set has unique and distinct features and our analysis establishes benchmarks and showcases some of the new opportunities made possible with this new data set.