论文标题
共形预测集有限的误报
Conformal Prediction Sets with Limited False Positives
论文作者
论文摘要
我们开发了一种新的方法来用于多标签共形预测,其中我们旨在输出一组有限制的错误答案的有前途的预测候选者。标准的保形预测通过构造校准的候选设置来代替单个预测,可以适应模型不确定性,并保证该集合包含具有很高概率的正确答案。但是,为了遵守这种覆盖范围,可以被嘈杂的候选人淹没,这可以使他们在实践中无济于事。这与预算有限的实际应用尤其重要,与假阳性相关的成本(货币或其他方式)不可忽略。我们建议通过强制执行预测的共形组中存在不正确的候选者(即假阳性总数),以根据用户指定的公差来限制覆盖范围。受此约束的前提下,我们的算法将优化针对设置覆盖率的广义概念(即,真正的正速率),该概念允许给定查询(包括零)的任何数量的真实答案。我们证明了这种方法在自然语言处理,计算机视觉和计算化学中的许多分类任务中的有效性。
We develop a new approach to multi-label conformal prediction in which we aim to output a precise set of promising prediction candidates with a bounded number of incorrect answers. Standard conformal prediction provides the ability to adapt to model uncertainty by constructing a calibrated candidate set in place of a single prediction, with guarantees that the set contains the correct answer with high probability. In order to obey this coverage property, however, conformal sets can become inundated with noisy candidates -- which can render them unhelpful in practice. This is particularly relevant to practical applications where there is a limited budget, and the cost (monetary or otherwise) associated with false positives is non-negligible. We propose to trade coverage for a notion of precision by enforcing that the presence of incorrect candidates in the predicted conformal sets (i.e., the total number of false positives) is bounded according to a user-specified tolerance. Subject to this constraint, our algorithm then optimizes for a generalized notion of set coverage (i.e., the true positive rate) that allows for any number of true answers for a given query (including zero). We demonstrate the effectiveness of this approach across a number of classification tasks in natural language processing, computer vision, and computational chemistry.