论文标题

学习用于非单调激活功能的单个神经元

Learning a Single Neuron for Non-monotonic Activation Functions

论文作者

Wu, Lei

论文摘要

我们研究了学习单个神经元$ \ mathbf {x} \mapstoσ(\ Mathbf {w}^t \ Mathbf {x})$的问题,并从梯度下降(GD)。所有现有的积极结果仅限于$σ$单调的情况。但是,最近观察到,在许多应用中,非单调激活函数的表现优于传统单调的激活功能。为了填补这一空白,我们在不假定单调性的情况下建立可学习性。具体而言,当输入分布是标准高斯时,我们表明$σ$的轻度条件(例如,$σ$具有主导线性部分)足以保证多项式时间和多项式样本中的可学习性。此外,通过对激活函数的更强假设,输入分布的条件可以放松到边缘分布的非分类。我们指出,通过实用的非单调激活功能,例如Silu/Swish和Gelu,我们对$σ$的条件得到了满足。我们还讨论了我们的积极结果如何与训练两层神经网络的现有负面结果有关。

We study the problem of learning a single neuron $\mathbf{x}\mapsto σ(\mathbf{w}^T\mathbf{x})$ with gradient descent (GD). All the existing positive results are limited to the case where $σ$ is monotonic. However, it is recently observed that non-monotonic activation functions outperform the traditional monotonic ones in many applications. To fill this gap, we establish learnability without assuming monotonicity. Specifically, when the input distribution is the standard Gaussian, we show that mild conditions on $σ$ (e.g., $σ$ has a dominating linear part) are sufficient to guarantee the learnability in polynomial time and polynomial samples. Moreover, with a stronger assumption on the activation function, the condition of input distribution can be relaxed to a non-degeneracy of the marginal distribution. We remark that our conditions on $σ$ are satisfied by practical non-monotonic activation functions, such as SiLU/Swish and GELU. We also discuss how our positive results are related to existing negative results on training two-layer neural networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源