论文标题
训练神经网络的差分熵估计器
A Differential Entropy Estimator for Training Neural Networks
论文作者
论文摘要
共同信息(MI)已被广泛用作训练神经网络的损失正规化程序。当学习高维数据的解剖或压缩表示时,这特别有效。但是,差异熵(DE)是信息的另一种基本衡量标准,在神经网络培训中尚未发现广泛使用。尽管DE提供了比MI的可能更广泛的应用程序,但现成的估计器要么是非可区分的,在计算上是棘手的,要么无法适应基础分布的变化。这些缺点使它们无法在神经网络培训中用作正规化器。为了解决DE先前提出的估计器中的缺点,我们在这里介绍了刀具,这是一个完全参数化的,基于DE的核心估计器。我们方法的灵活性还使我们能够为条件(离散变量或连续变量)以及MI构建基于刀的估计器。我们从经验上验证了高维合成数据的方法,并进一步应用它来指导神经网络的现实任务培训。我们对各种任务的实验,包括视觉域的适应性,文本公平分类和文本微调,证明了基于刀的估计的有效性。代码可以在https://github.com/g-pichler/knife上找到。
Mutual Information (MI) has been widely used as a loss regularizer for training neural networks. This has been particularly effective when learn disentangled or compressed representations of high dimensional data. However, differential entropy (DE), another fundamental measure of information, has not found widespread use in neural network training. Although DE offers a potentially wider range of applications than MI, off-the-shelf DE estimators are either non differentiable, computationally intractable or fail to adapt to changes in the underlying distribution. These drawbacks prevent them from being used as regularizers in neural networks training. To address shortcomings in previously proposed estimators for DE, here we introduce KNIFE, a fully parameterized, differentiable kernel-based estimator of DE. The flexibility of our approach also allows us to construct KNIFE-based estimators for conditional (on either discrete or continuous variables) DE, as well as MI. We empirically validate our method on high-dimensional synthetic data and further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including visual domain adaptation, textual fair classification, and textual fine-tuning demonstrate the effectiveness of KNIFE-based estimation. Code can be found at https://github.com/g-pichler/knife.