与时间依赖性的保形预测间隔

论文标题

与时间依赖性的保形预测间隔

Conformal Prediction Intervals with Temporal Dependence

论文作者

Lin, Zhen, Trivedi, Shubhendu, Sun, Jimeng

论文摘要

横断面预测在许多领域（例如医疗保健）中很常见，包括使用电子健康记录进行的预测任务，而不同的患者形成了横截面。我们专注于构造有效预测间隔（PI）的任务，以横截面的序列回归。如果预测间隔涵盖了具有（预先指定的）高概率的真实响应，则认为该间隔是有效的。我们首先在这种情况下区分两个有效性概念：横截面和纵向。横截面有效性与时间序列数据的横截面的有效性有关，而纵向有效性则占时间维度。沿这两个方面的覆盖范围是理想的理想选择。但是，我们表明，在理论上是不可能的，无分配的纵向有效性是不可能的。尽管有这一限制，我们还是提出了与时间依赖性（CPTD）的共形预测，该程序能够保持严格的横截面有效性，同时改善纵向覆盖范围。 CPTD是事后和轻量重量，只要有校准集可用，就可以与任何预测模型结合使用。由于它们能够建模复杂的数据（例如时间序列回归），并执行广泛的实验验证以验证我们方法的功效，因此我们专注于神经网络。我们发现，CPTD通过改善纵向覆盖范围并经常提供更有效（较窄）的PI来优于各种数据集上的基准。

Cross-sectional prediction is common in many domains such as healthcare, including forecasting tasks using electronic health records, where different patients form a cross-section. We focus on the task of constructing valid prediction intervals (PIs) in time series regression with a cross-section. A prediction interval is considered valid if it covers the true response with (a pre-specified) high probability. We first distinguish between two notions of validity in such a setting: cross-sectional and longitudinal. Cross-sectional validity is concerned with validity across the cross-section of the time series data, while longitudinal validity accounts for the temporal dimension. Coverage guarantees along both these dimensions are ideally desirable; however, we show that distribution-free longitudinal validity is theoretically impossible. Despite this limitation, we propose Conformal Prediction with Temporal Dependence (CPTD), a procedure that is able to maintain strict cross-sectional validity while improving longitudinal coverage. CPTD is post-hoc and light-weight, and can easily be used in conjunction with any prediction model as long as a calibration set is available. We focus on neural networks due to their ability to model complicated data such as diagnosis codes for time series regression, and perform extensive experimental validation to verify the efficacy of our approach. We find that CPTD outperforms baselines on a variety of datasets by improving longitudinal coverage and often providing more efficient (narrower) PIs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题