过度参数化神经网络的数学模型

论文标题

过度参数化神经网络的数学模型

Mathematical Models of Overparameterized Neural Networks

论文作者

Fang, Cong, Dong, Hanze, Zhang, Tong

论文摘要

近年来，深度学习获得了相当大的经验成功。但是，尽管从业者发现了许多临时技巧，但直到最近，对深度学习文献中发明的技巧的理论理解缺乏理论。从业人员知道过度参数化的神经网络很容易学习，在过去的几年中，在分析过度参数化的神经网络中有重要的理论发展。特别是，这表明，这种系统的行为就像在各种受限制的设置（例如两层NNS）下的凸系统一样，以及在所谓的专门初始化围绕所谓的神经切线内核空间中进行限制的学习时。本文讨论了其中一些最近的进展，从而使对神经网络的了解有了重大更好的了解。我们将专注于对两层神经网络的分析，并及其算法的含义来解释关键的数学模型。然后，我们将讨论了解深神经网络和一些当前研究方向的挑战。

Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important theoretical developments in the analysis of overparameterized neural networks. In particular, it was shown that such systems behave like convex systems under various restricted settings, such as for two-layer NNs, and when learning is restricted locally in the so-called neural tangent kernel space around specialized initializations. This paper discusses some of these recent progresses leading to significant better understanding of neural networks. We will focus on the analysis of two-layer neural networks, and explain the key mathematical models, with their algorithmic implications. We will then discuss challenges in understanding deep neural networks and some current research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题