基准为回归任务的贝叶斯神经网络和评估指标

论文标题

基准为回归任务的贝叶斯神经网络和评估指标

Benchmarking Bayesian neural networks and evaluation metrics for regression tasks

论文作者

Staber, Brian, Da Veiga, Sébastien

论文摘要

由于许多科学和工程领域的深层神经网络越来越多，建模和估计其不确定性已成为主要的重要性。尽管关于深度学习中不确定性量化的文献越来越多，但不确定性估计的质量仍然是一个悬而未决的问题。在这项工作中，我们首次评估贝叶斯神经网络对回归任务的几种近似方法的性能，通过评估几个覆盖范围指标的置信区域质量。还可以根据预测性，内核化的STEIN差异和最大平均差异在重量和功能空间中比较最大的平均差异。我们的发现表明，（i）某些算法具有出色的预测性能，但很大程度上会超过或低估不确定性（ii），可以实现良好的准确性，并且具有精细调整过的超级参数的给定目标覆盖范围，并且（iii）有希望的内核施Stein差异不能独立地评估posterior近似。作为该基准测试的副产品，我们还计算和可视化所有算法和相应的超参数的相似性：有趣的是，我们确定了一些在体重空间中具有相似行为的算法簇，从而为它们探索后验分布提供了新的见解。

Due to the growing adoption of deep neural networks in many fields of science and engineering, modeling and estimating their uncertainties has become of primary importance. Despite the growing literature about uncertainty quantification in deep learning, the quality of the uncertainty estimates remains an open question. In this work, we assess for the first time the performance of several approximation methods for Bayesian neural networks on regression tasks by evaluating the quality of the confidence regions with several coverage metrics. The selected algorithms are also compared in terms of predictivity, kernelized Stein discrepancy and maximum mean discrepancy with respect to a reference posterior in both weight and function space. Our findings show that (i) some algorithms have excellent predictive performance but tend to largely over or underestimate uncertainties (ii) it is possible to achieve good accuracy and a given target coverage with finely tuned hyperparameters and (iii) the promising kernel Stein discrepancy cannot be exclusively relied on to assess the posterior approximation. As a by-product of this benchmark, we also compute and visualize the similarity of all algorithms and corresponding hyperparameters: interestingly we identify a few clusters of algorithms with similar behavior in weight space, giving new insights on how they explore the posterior distribution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题