论文标题

神经网络近似:三个隐藏层足够

Neural Network Approximation: Three Hidden Layers Are Enough

论文作者

Shen, Zuowei, Yang, Haizhao, Zhang, Shijun

论文摘要

引入了具有超近似功率的三层神经网络。该网络是使用地板功能($ \ lfloor x \ rfloor $),指数函数($ 2^x $),步骤函数($ 1_ {x \ geq 0} $)或它们作为每个神经元中的激活函数的组成,因此我们称之为平面指数step(FLES)网络。对于任何宽度超级参数$ n \ in \ mathbb {n}^+$,可以证明,触发具有宽度$ \ max \ \ {d,n \} $的网络和三个隐藏的图层可以均匀地近似于Hölder连续函数$ f $ in $ [0,1]^d $ in $ [0,1]^d $ n $ [0,1]^d $ $3λ(2) 2^{ - αn} $,其中$α\ in(0,1] $和$λ> 0 $分别是hölder订单和恒定。更一般而言,对于任意连续函数$ f $ in $ [0,1]^d $ in $ [0,1]^d $,具有连续性$ω_f(\ cdot)$的模式$2Ω_f(2 \ sqrt {d}){2^{ - n}}+ω_f(2 \ sqrt {d} \,2^{ - n})$。当$ω_f(r)$作为$ r \ rightArrow 0 $的变化时,克服近似功率的维数的诅咒是中等的(例如,$ω_f(r)\ lyssim r^α$ forhölder连续功能),因为在我们的近似$ $ $ $ $ $ \ s的主要范围内是$ \ s s q sqrt n of time of time)最后,我们扩展了分析,以通过连续激活功能替换[1,\ infty)$的$ l^p $ norm $ p $ norm。

A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function ($\lfloor x\rfloor$), the exponential function ($2^x$), the step function ($1_{x\geq 0}$), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter $N\in\mathbb{N}^+$, it is shown that FLES networks with width $\max\{d,N\}$ and three hidden layers can uniformly approximate a Hölder continuous function $f$ on $[0,1]^d$ with an exponential approximation rate $3λ(2\sqrt{d})^α 2^{-αN}$, where $α\in(0,1]$ and $λ>0$ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $ω_f(\cdot)$, the constructive approximation rate is $2ω_f(2\sqrt{d}){2^{-N}}+ω_f(2\sqrt{d}\,2^{-N})$. Moreover, we extend such a result to general bounded continuous functions on a bounded set $E\subseteq\mathbb{R}^d$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ω_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $ω_f(r)\lesssim r^α$ for Hölder continuous functions), since the major term to be concerned in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ independent of $d$ within the modulus of continuity. Finally, we extend our analysis to derive similar approximation results in the $L^p$-norm for $p\in[1,\infty)$ via replacing Floor-Exponential-Step activation functions by continuous activation functions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源