通过Fisher信息和信息理论数量进行主动学习和主动采样的统一方法

论文标题

通过Fisher信息和信息理论数量进行主动学习和主动采样的统一方法

Unifying Approaches in Active Learning and Active Sampling via Fisher Information and Information-Theoretic Quantities

论文作者

Kirsch, Andreas, Gal, Yarin

论文摘要

最近提出的数据子集选择方法，即主动学习和主动采样，使用Fisher信息，Hessians，基于梯度的相似性矩阵以及梯度长度，以估算模型培训的信息信息。这些不同的方法是否连接了，如果是，如何？我们重新审视了贝叶斯最佳实验设计的基本原理，并表明这些最近提出的方法可以理解为信息理论数量的近似值：其中，预测和模型参数之间的共同信息，即机器学习中的预期信息获得或秃头，以及在获得预测的预测信息中的预测候选人和测试样本之间的相互信息，已知的预测信息获得了预测信息。我们使用Fisher信息并观察到的信息开发了一组综合近似值，并得出了连接看似不同文学的统一框架。尽管贝叶斯方法通常被视为与非bayesian方法分开，但有时在各种非bayesian目标中表达的“信息性”的模糊概念导致同一信息数量，从原则上讲，这些信息数量已经以Lindley（1956）和Mackay（1992）（1992）所知。

Recently proposed methods in data subset selection, that is active learning and active sampling, use Fisher information, Hessians, similarity matrices based on gradients, and gradient lengths to estimate how informative data is for a model's training. Are these different approaches connected, and if so, how? We revisit the fundamentals of Bayesian optimal experiment design and show that these recently proposed methods can be understood as approximations to information-theoretic quantities: among them, the mutual information between predictions and model parameters, known as expected information gain or BALD in machine learning, and the mutual information between predictions of acquisition candidates and test samples, known as expected predictive information gain. We develop a comprehensive set of approximations using Fisher information and observed information and derive a unified framework that connects seemingly disparate literature. Although Bayesian methods are often seen as separate from non-Bayesian ones, the sometimes fuzzy notion of "informativeness" expressed in various non-Bayesian objectives leads to the same couple of information quantities, which were, in principle, already known by Lindley (1956) and MacKay (1992).

下载PDF全文

下载文献需遵守相关版权规定

论文标题