论文标题
研究自动生成和标记的植物图像的分类学习曲线
Investigating classification learning curves for automatically generated and labelled plant images
论文作者
论文摘要
在监督机器学习的背景下,学习曲线描述了模型在看不见的数据上的性能如何与用于训练模型的样本数量有关。在本文中,我们介绍了植物图像的数据集,其中包括不同生长阶段的曼尼托巴省大草原共有的农作物和杂草的代表。我们通过Resnet体系结构确定了此数据上的分类任务的学习曲线。我们的结果与以前的研究一致,并增加了以下证据:学习曲线受大规模,应用和模型的权力关系关系。我们进一步研究标签噪声和可训练参数的减少如何影响该数据集的学习曲线。这两种效应都导致了模型,需要不成比例的训练集,以实现与没有这些效果的相同的分类性能。
In the context of supervised machine learning a learning curve describes how a model's performance on unseen data relates to the amount of samples used to train the model. In this paper we present a dataset of plant images with representatives of crops and weeds common to the Manitoba prairies at different growth stages. We determine the learning curve for a classification task on this data with the ResNet architecture. Our results are in accordance with previous studies and add to the evidence that learning curves are governed by power-law relationships over large scales, applications, and models. We further investigate how label noise and the reduction of trainable parameters impacts the learning curve on this dataset. Both effects lead to the model requiring disproportionally larger training sets to achieve the same classification performance as observed without these effects.