论文标题
具有物理输出的可区分,可学习的基于过程化的基于过程化的模型可以接近最先进的水文预测准确性
Differentiable, learnable, regionalized process-based models with physical outputs can approach state-of-the-art hydrologic prediction accuracy
论文作者
论文摘要
整个水周期中水文变量的预测对于水资源管理以及下游应用(例如生态系统和水质建模)具有显着价值。最近,纯粹由数据驱动的深度学习模型(例如长期短期记忆(LSTM)(LSTM))在建模降雨跑步和其他地球科学变量时表现出似乎令人难以置信的性能,但是它们无法预测未经训练的物理变量,并且仍然具有挑战性。在这里,我们表明,可区分,可学习的,基于过程的模型(此处称为δ模型)可以接近具有区域化参数化的强烈观察变量(流)的LSTM的性能水平。我们使用简单的水文模型HBV作为骨干,并使用嵌入式神经网络,该网络只能在可区分的编程框架中训练,以参数化,增强或替换基于过程的模型模块。不使用合奏或后处理器,δ模型可以获得整个美国的671个盆地的NASH Sutcliffe效率中位数为0.732,用于Daymet迫使数据集,而来自最新的LSTM LSTM模型则为0.748。对于另一个强迫数据集,差异甚至更小:0.715 vs. 0.722。同时,由此产生的可学习的基于过程的模型可以输出一组未经训练的变量,例如土壤和地下水存储,积雪,蒸散液和基础流,后来被其观察结果限制。模拟蒸散量和从基流的排放部分都与替代性估计同意。一般框架可以与具有各种过程复杂性的模型一起使用,并为从大数据学习物理学开辟了道路。
Predictions of hydrologic variables across the entire water cycle have significant value for water resource management as well as downstream applications such as ecosystem and water quality modeling. Recently, purely data-driven deep learning models like long short-term memory (LSTM) showed seemingly-insurmountable performance in modeling rainfall-runoff and other geoscientific variables, yet they cannot predict untrained physical variables and remain challenging to interpret. Here we show that differentiable, learnable, process-based models (called δ models here) can approach the performance level of LSTM for the intensively-observed variable (streamflow) with regionalized parameterization. We use a simple hydrologic model HBV as the backbone and use embedded neural networks, which can only be trained in a differentiable programming framework, to parameterize, enhance, or replace the process-based model modules. Without using an ensemble or post-processor, δ models can obtain a median Nash Sutcliffe efficiency of 0.732 for 671 basins across the USA for the Daymet forcing dataset, compared to 0.748 from a state-of-the-art LSTM model with the same setup. For another forcing dataset, the difference is even smaller: 0.715 vs. 0.722. Meanwhile, the resulting learnable process-based models can output a full set of untrained variables, e.g., soil and groundwater storage, snowpack, evapotranspiration, and baseflow, and later be constrained by their observations. Both simulated evapotranspiration and fraction of discharge from baseflow agreed decently with alternative estimates. The general framework can work with models with various process complexity and opens up the path for learning physics from big data.