论文标题

软根 - 符号激活函数

Soft-Root-Sign Activation Function

论文作者

Zhou, Yuan, Li, Dandan, Huo, Shuwei, Kung, Sun-Yuan

论文摘要

深层网络中激活功能的选择对训练动态和任务性能有重大影响。目前,最有效,最广泛使用的激活函数是恢复的。但是,由于非零均值,负丢失和无界输出,Relu在优化过程中处于潜在的劣势。为此,我们引入了一种新颖的激活功能,以克服上述三个挑战。提出的非线性,即“软根”(SRS),是平滑的,非单调的,并且有界。值得注意的是,SRS的有限属性与大多数最新激活功能区分开来。与Relu相反,SRS可以通过一对可独立训练的参数自适应地调整输出,以捕获负面信息并提供零均值的属性,这不仅导致了更好的概括性能,还导致了更快的学习速度。它还避免并纠正要散布在非负实际数量空间中的输出分布,从而使其与批处理归一化(BN)更兼容,并且对初始化敏感。在实验中,我们评估了应用于各种任务的深网络上的SR,包括图像分类,机器翻译和生成建模。我们的SRS匹配或超过具有Relu和其他最先进的非线性模型,表明所提出的激活函数是广义的,并且可以跨任务实现高性能。消融研究进一步验证了与BN的兼容性和自适应性,以进行不同的初始化。

The choice of activation function in deep networks has a significant effect on the training dynamics and task performance. At present, the most effective and widely-used activation function is ReLU. However, because of the non-zero mean, negative missing and unbounded output, ReLU is at a potential disadvantage during optimization. To this end, we introduce a novel activation function to manage to overcome the above three challenges. The proposed nonlinearity, namely "Soft-Root-Sign" (SRS), is smooth, non-monotonic, and bounded. Notably, the bounded property of SRS distinguishes itself from most state-of-the-art activation functions. In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters to capture negative information and provide zero-mean property, which leading not only to better generalization performance, but also to faster learning speed. It also avoids and rectifies the output distribution to be scattered in the non-negative real number space, making it more compatible with batch normalization (BN) and less sensitive to initialization. In experiments, we evaluated SRS on deep networks applied to a variety of tasks, including image classification, machine translation and generative modelling. Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities, showing that the proposed activation function is generalized and can achieve high performance across tasks. Ablation study further verified the compatibility with BN and self-adaptability for different initialization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源