共形预测集有限的误报

论文标题

共形预测集有限的误报

Conformal Prediction Sets with Limited False Positives

论文作者

Fisch, Adam, Schuster, Tal, Jaakkola, Tommi, Barzilay, Regina

论文摘要

我们开发了一种新的方法来用于多标签共形预测，其中我们旨在输出一组有限制的错误答案的有前途的预测候选者。标准的保形预测通过构造校准的候选设置来代替单个预测，可以适应模型不确定性，并保证该集合包含具有很高概率的正确答案。但是，为了遵守这种覆盖范围，可以被嘈杂的候选人淹没，这可以使他们在实践中无济于事。这与预算有限的实际应用尤其重要，与假阳性相关的成本（货币或其他方式）不可忽略。我们建议通过强制执行预测的共形组中存在不正确的候选者（即假阳性总数），以根据用户指定的公差来限制覆盖范围。受此约束的前提下，我们的算法将优化针对设置覆盖率的广义概念（即，真正的正速率），该概念允许给定查询（包括零）的任何数量的真实答案。我们证明了这种方法在自然语言处理，计算机视觉和计算化学中的许多分类任务中的有效性。

We develop a new approach to multi-label conformal prediction in which we aim to output a precise set of promising prediction candidates with a bounded number of incorrect answers. Standard conformal prediction provides the ability to adapt to model uncertainty by constructing a calibrated candidate set in place of a single prediction, with guarantees that the set contains the correct answer with high probability. In order to obey this coverage property, however, conformal sets can become inundated with noisy candidates -- which can render them unhelpful in practice. This is particularly relevant to practical applications where there is a limited budget, and the cost (monetary or otherwise) associated with false positives is non-negligible. We propose to trade coverage for a notion of precision by enforcing that the presence of incorrect candidates in the predicted conformal sets (i.e., the total number of false positives) is bounded according to a user-specified tolerance. Subject to this constraint, our algorithm then optimizes for a generalized notion of set coverage (i.e., the true positive rate) that allows for any number of true answers for a given query (including zero). We demonstrate the effectiveness of this approach across a number of classification tasks in natural language processing, computer vision, and computational chemistry.

下载PDF全文

下载文献需遵守相关版权规定

论文标题