论文标题
时空数据的预测和模型评估
Prediction & Model Evaluation for Space-Time Data
论文作者
论文摘要
预测误差,模型选择和平均时空数据模型的评估指标被研究且了解不足。缺乏独立复制使预测歧义是一个概念,并渲染了针对大多数时空预测问题不合适的独立数据开发的评估程序。由2008年在加利福尼亚野火期间收集的空气污染数据的激励,该手稿试图对与空间插值相关的真实预测误差进行形式化。我们研究了采用模拟和案例研究的各种交叉验证(CV)程序,以洞悉替代数据分区策略针对的估计性的性质。与最近的最佳实践一致,我们发现基于位置的交叉验证适用于我们对加利福尼亚野火数据的分析中的空间插值误差。有趣的是,普遍认为CV折叠尺寸的偏置方差权衡折衷的概念并不适用于依赖数据,我们建议您作为空间插值的首选预测误差指标。
Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.