论文标题
了解您所知道的:在多类和多标签预测中有效和经过验证的信心集
Knowing what you know: valid and validated confidence sets in multiclass and multilabel prediction
论文作者
论文摘要
我们开发了在多类和多标签问题中构建有效的预测置信集的共形预测方法,而没有对数据生成分布的假设。这里的一个挑战是,典型的保形预测方法 - 赋予边际有效性(覆盖范围)保证---提供不均匀的覆盖范围,因为它们以牺牲基本上忽略困难的例子为代价解决了简单的例子。通过利用分位数回归的想法,我们构建了始终保证正确覆盖范围但还为多类和多标记预测问题提供(渐近最佳的)条件覆盖的方法。为了解决多标签预测中指数较大的置信度集的潜在挑战,我们构建了树结构化的分类器,该分类器有效地说明了标签之间的相互作用。我们的方法可以固定在任何分类模型之上---神经网络,随机森林,增强树 - - 以确保其有效性。我们还提供了经验评估,同时提供了新的验证方法,这表明我们信心集的覆盖范围更强。
We develop conformal prediction methods for constructing valid predictive confidence sets in multiclass and multilabel problems without assumptions on the data generating distribution. A challenge here is that typical conformal prediction methods---which give marginal validity (coverage) guarantees---provide uneven coverage, in that they address easy examples at the expense of essentially ignoring difficult examples. By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide (asymptotically optimal) conditional coverage for both multiclass and multilabel prediction problems. To address the potential challenge of exponentially large confidence sets in multilabel prediction, we build tree-structured classifiers that efficiently account for interactions between labels. Our methods can be bolted on top of any classification model---neural network, random forest, boosted tree---to guarantee its validity. We also provide an empirical evaluation, simultaneously providing new validation methods, that suggests the more robust coverage of our confidence sets.