神经网络近似：三个隐藏层足够

论文标题

神经网络近似：三个隐藏层足够

Neural Network Approximation: Three Hidden Layers Are Enough

论文作者

Shen, Zuowei, Yang, Haizhao, Zhang, Shijun

论文摘要

引入了具有超近似功率的三层神经网络。该网络是使用地板功能（$ \ lfloor x \ rfloor $），指数函数（$ 2^x $），步骤函数（$ 1_ {x \ geq 0} $）或它们作为每个神经元中的激活函数的组成，因此我们称之为平面指数step（FLES）网络。对于任何宽度超级参数$ n \ in \ mathbb {n}^+$，可以证明，触发具有宽度$ \ max \ \ {d，n \} $的网络和三个隐藏的图层可以均匀地近似于Hölder连续函数$ f $ in $ [0,1]^d $ in $ [0,1]^d $ n $ [0,1]^d $ $3λ（2） 2^{ - αn} $，其中$α\ in（0,1] $和$λ> 0 $分别是hölder订单和恒定。更一般而言，对于任意连续函数$ f $ in $ [0,1]^d $ in $ [0,1]^d $，具有连续性$ω_f（\ cdot）$的模式$2Ω_f（2 \ sqrt {d}）{2^{ - n}}+ω_f（2 \ sqrt {d} \，2^{ - n}）$。当$ω_f（r）$作为$ r \ rightArrow 0 $的变化时，克服近似功率的维数的诅咒是中等的（例如，$ω_f（r）\ lyssim r^α$ forhölder连续功能），因为在我们的近似$ $ $ $ $ $ \ s的主要范围内是$ \ s s q sqrt n of time of time）最后，我们扩展了分析，以通过连续激活功能替换[1，\ infty）$的$ l^p $ norm $ p $ norm。

A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function ($\lfloor x\rfloor$), the exponential function ($2^x$), the step function ($1_{x\geq 0}$), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter $N\in\mathbb{N}^+$, it is shown that FLES networks with width $\max\{d,N\}$ and three hidden layers can uniformly approximate a Hölder continuous function $f$ on $[0,1]^d$ with an exponential approximation rate $3λ(2\sqrt{d})^α 2^{-αN}$, where $α\in(0,1]$ and $λ>0$ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $ω_f(\cdot)$, the constructive approximation rate is $2ω_f(2\sqrt{d}){2^{-N}}+ω_f(2\sqrt{d}\,2^{-N})$. Moreover, we extend such a result to general bounded continuous functions on a bounded set $E\subseteq\mathbb{R}^d$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ω_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $ω_f(r)\lesssim r^α$ for Hölder continuous functions), since the major term to be concerned in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ independent of $d$ within the modulus of continuity. Finally, we extend our analysis to derive similar approximation results in the $L^p$-norm for $p\in[1,\infty)$ via replacing Floor-Exponential-Step activation functions by continuous activation functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题