软根 - 符号激活函数

论文标题

软根 - 符号激活函数

Soft-Root-Sign Activation Function

论文作者

Zhou, Yuan, Li, Dandan, Huo, Shuwei, Kung, Sun-Yuan

论文摘要

深层网络中激活功能的选择对训练动态和任务性能有重大影响。目前，最有效，最广泛使用的激活函数是恢复的。但是，由于非零均值，负丢失和无界输出，Relu在优化过程中处于潜在的劣势。为此，我们引入了一种新颖的激活功能，以克服上述三个挑战。提出的非线性，即“软根”（SRS），是平滑的，非单调的，并且有界。值得注意的是，SRS的有限属性与大多数最新激活功能区分开来。与Relu相反，SRS可以通过一对可独立训练的参数自适应地调整输出，以捕获负面信息并提供零均值的属性，这不仅导致了更好的概括性能，还导致了更快的学习速度。它还避免并纠正要散布在非负实际数量空间中的输出分布，从而使其与批处理归一化（BN）更兼容，并且对初始化敏感。在实验中，我们评估了应用于各种任务的深网络上的SR，包括图像分类，机器翻译和生成建模。我们的SRS匹配或超过具有Relu和其他最先进的非线性模型，表明所提出的激活函数是广义的，并且可以跨任务实现高性能。消融研究进一步验证了与BN的兼容性和自适应性，以进行不同的初始化。

The choice of activation function in deep networks has a significant effect on the training dynamics and task performance. At present, the most effective and widely-used activation function is ReLU. However, because of the non-zero mean, negative missing and unbounded output, ReLU is at a potential disadvantage during optimization. To this end, we introduce a novel activation function to manage to overcome the above three challenges. The proposed nonlinearity, namely "Soft-Root-Sign" (SRS), is smooth, non-monotonic, and bounded. Notably, the bounded property of SRS distinguishes itself from most state-of-the-art activation functions. In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters to capture negative information and provide zero-mean property, which leading not only to better generalization performance, but also to faster learning speed. It also avoids and rectifies the output distribution to be scattered in the non-negative real number space, making it more compatible with batch normalization (BN) and less sensitive to initialization. In experiments, we evaluated SRS on deep networks applied to a variety of tasks, including image classification, machine translation and generative modelling. Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities, showing that the proposed activation function is generalized and can achieve high performance across tasks. Ablation study further verified the compatibility with BN and self-adaptability for different initialization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题