论文标题
比较苜蓿生物量产量预测的机器学习技术
Comparing Machine Learning Techniques for Alfalfa Biomass Yield Prediction
论文作者
论文摘要
作为牲畜饲料,苜蓿作物在全球范围内很重要,因此高效的种植和收获可以使许多行业受益,尤其是随着全球气候变化和传统方法的准确性降低。最近使用机器学习(ML)预测苜蓿和其他农作物产量的工作表现出了希望。以前的努力使用遥感,天气,种植和土壤数据来训练机器学习模型进行收益预测。但是,尽管遥感效果很好,但这些模型需要大量数据,并且在收获季节开始之前无法做出预测。使用肯塔基州和佐治亚州苜蓿综艺试验的天气和种植数据,我们以前的工作比较了特征选择技术,以找到最佳技术和最佳功能集。在这项工作中,我们培训了各种机器学习模型,并使用交叉验证进行超参数优化,以预测生物质的产量,并且比采用更复杂技术的类似工作表现出更好的准确性。我们最好的单个模型是一个随机森林,平均绝对误差为0.081吨/英亩,r {$^2 $}为0.941。接下来,我们将此数据集扩展到包括威斯康星州和密西西比州,然后重复了实验,并使用回归树获得了0.982的最佳最佳r {$^2 $}。然后,当我们在多个源状态训练并在一个目标状态下进行了测试时,我们按状态隔离了测试数据集,以探索该问题的域适应性(DA)。这种微不足道的DA(TDA)方法可以通过探索即将上映的工作中更复杂的DA技术来改善空间。
The alfalfa crop is globally important as livestock feed, so highly efficient planting and harvesting could benefit many industries, especially as the global climate changes and traditional methods become less accurate. Recent work using machine learning (ML) to predict yields for alfalfa and other crops has shown promise. Previous efforts used remote sensing, weather, planting, and soil data to train machine learning models for yield prediction. However, while remote sensing works well, the models require large amounts of data and cannot make predictions until the harvesting season begins. Using weather and planting data from alfalfa variety trials in Kentucky and Georgia, our previous work compared feature selection techniques to find the best technique and best feature set. In this work, we trained a variety of machine learning models, using cross validation for hyperparameter optimization, to predict biomass yields, and we showed better accuracy than similar work that employed more complex techniques. Our best individual model was a random forest with a mean absolute error of 0.081 tons/acre and R{$^2$} of 0.941. Next, we expanded this dataset to include Wisconsin and Mississippi, and we repeated our experiments, obtaining a higher best R{$^2$} of 0.982 with a regression tree. We then isolated our testing datasets by state to explore this problem's eligibility for domain adaptation (DA), as we trained on multiple source states and tested on one target state. This Trivial DA (TDA) approach leaves plenty of room for improvement through exploring more complex DA techniques in forthcoming work.