论文标题
使用内核密度估算分类置信度
Estimating Classification Confidence Using Kernel Densities
论文作者
论文摘要
本文研究了“探索性”机器学习分类问题的置信后的事后校准。这些问题的困难源于持续的愿望,即在策划数据集时具有足够的例子来推广哪些类别的界限,以及对这些类别的有效性的困惑。我们认为,对于此类问题,必须使用“单一的所有”方法(顶部标签校准),而不是文献中其他地方提倡的“校准 - 满足 - 响应 - 摩托克斯”方法。我们介绍和测试四种旨在处理特定置信度估计的特质的新算法。这些方法中的主要主要是将内核密度比用于置信度校准,包括用于选择带宽的新颖的防弹算法。我们测试了我们的主张,并探讨了在生物信息学应用程序(Phanns)以及经典的MNIST基准上的校准限制。最后,我们的分析认为,应始终执行事后校准,仅应基于测试数据集,应在视觉上进行理智检查。
This paper investigates the post-hoc calibration of confidence for "exploratory" machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the "one-versus-all" approach (top-label calibration) must be used rather than the "calibrate-the-full-response-matrix" approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel, bulletproof algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, should be based only on the test dataset, and should be sanity-checked visually.