数据驱动的有效模型显示了类似液体的深度学习

论文标题

数据驱动的有效模型显示了类似液体的深度学习

Data-driven effective model shows a liquid-like deep learning

论文作者

Zou, Wenxuan, Huang, Haiping

论文摘要

优化格局的几何结构被认为对于支持深神网络学习的成功至关重要。很难对两层的直接计算景观。因此，为了捕捉景观的全球视图，必须建立网络参数（或权重）空间的可解释模型。但是，到目前为止，该模型还没有。此外，对于二进制突触的深层网络，景观的外观仍然未知，这在鲁棒和节能的神经形态计算中起着关键作用。在这里，我们通过直接构建高维重量空间的最小结构化模型来提出一个统计力学框架，考虑了实际的结构化数据，随机梯度下降训练以及神经网络的计算深度。我们还考虑网络参数的数量是否超过了提供的培训数据的数量，即过度或不足的参数。我们结构最小的模型表明，参数化和参数过度的案例的重量空间属于同一类，从某种意义上说，这些权重空间连接良好而没有任何分层聚类结构。相反，浅网络的重量空间破裂，其特征是不连续的相变，从而阐明了从高维几何学角度深度学习中深度学习的益处。我们的有效模型还揭示了在深层网络中，存在架构的液体般的中心部分，因为这部分中的权重表现尽可能随机，从而提供算法含义。因此，我们的数据驱动模型提供了一个统计力学的见解，内容涉及为什么深度学习在高维重量空间以及深网络与浅网络有何不同之处。

The geometric structure of an optimization landscape is argued to be fundamentally important to support the success of deep neural network learning. A direct computation of the landscape beyond two layers is hard. Therefore, to capture the global view of the landscape, an interpretable model of the network-parameter (or weight) space must be established. However, the model is lacking so far. Furthermore, it remains unknown what the landscape looks like for deep networks of binary synapses, which plays a key role in robust and energy efficient neuromorphic computation. Here, we propose a statistical mechanics framework by directly building a least structured model of the high-dimensional weight space, considering realistic structured data, stochastic gradient descent training, and the computational depth of neural networks. We also consider whether the number of network parameters outnumbers the number of supplied training data, namely, over- or under-parametrization. Our least structured model reveals that the weight spaces of the under-parametrization and over-parameterization cases belong to the same class, in the sense that these weight spaces are well-connected without any hierarchical clustering structure. In contrast, the shallow-network has a broken weight space, characterized by a discontinuous phase transition, thereby clarifying the benefit of depth in deep learning from the angle of high dimensional geometry. Our effective model also reveals that inside a deep network, there exists a liquid-like central part of the architecture in the sense that the weights in this part behave as randomly as possible, providing algorithmic implications. Our data-driven model thus provides a statistical mechanics insight about why deep learning is unreasonably effective in terms of the high-dimensional weight space, and how deep networks are different from shallow ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题