使用Kronecker产品单数值分解的神经网络中Fisher矩阵的有效近似

论文标题

使用Kronecker产品单数值分解的神经网络中Fisher矩阵的有效近似

Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition

论文作者

Koroko, Abdoulaye, Anciaux-Sedrakian, Ani, Gharbia, Ibtihel Ben, Garès, Valérie, Haddou, Mounir, Tran, Quang Huy

论文摘要

几项研究表明，自然梯度下降比基于普通梯度下降方法更有效地最小化目标函数的能力。但是，这种训练深神经网络的方法的瓶颈在于解决每次迭代时与Fisher Information Matrix（FIM）相对应的大型密集线性系统的高昂成本。这促使了确切的FIM或经验的各种近似值。其中最复杂的是KFAC，它涉及FIM的Kronecker-kroneck-tock块对角近似。只需额外的额外费用，就提出了KFAC的一些改进。四种新方法的共同特征是它们依赖于直接最小化问题，该解决方案可以通过Kronecker产品奇异值分解技术来计算。三个标准的深度自动编码基准测试的实验结果表明，它们为FIM提供了更准确的近似值。此外，就优化速度而言，它们的表现优于KFAC和最先进的一阶方法。

Several studies have shown the ability of natural gradient descent to minimize the objective function more efficiently than ordinary gradient descent based methods. However, the bottleneck of this approach for training deep neural networks lies in the prohibitive cost of solving a large dense linear system corresponding to the Fisher Information Matrix (FIM) at each iteration. This has motivated various approximations of either the exact FIM or the empirical one. The most sophisticated of these is KFAC, which involves a Kronecker-factored block diagonal approximation of the FIM. With only a slight additional cost, a few improvements of KFAC from the standpoint of accuracy are proposed. The common feature of the four novel methods is that they rely on a direct minimization problem, the solution of which can be computed via the Kronecker product singular value decomposition technique. Experimental results on the three standard deep auto-encoder benchmarks showed that they provide more accurate approximations to the FIM. Furthermore, they outperform KFAC and state-of-the-art first-order methods in terms of optimization speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题