ANACT：激活功能的自适应归一化

论文标题

ANACT：激活功能的自适应归一化

ANAct: Adaptive Normalization for Activation Functions

论文作者

Peiwen, Yuan, Liu, Henan, Changsheng, Zhu, Wang, Yuyi

论文摘要

在本文中，我们研究了激活功能对前向和向后传播的负面影响以及如何抵消这种效果。首先，我们研究激活功能如何影响神经网络的前向和向后传播，并得出梯度方差的一般形式，该梯度方差扩展了该领域的先前工作。我们尝试使用迷你批次统计来动态更新标准化因子，以确保整个训练过程中的归一化属性，而不仅仅是重量初始化后神经网络的状态。其次，我们提出了ANACT，一种将激活函数归一化的方法以维持层次的梯度方差，并通过实验证明其有效性。我们观察到收敛速率与归一化属性大致相关。我们将ANACT与CNN和残留网络上的几个常见激活函数进行了比较，并表明ANACT始终提高其性能。例如，与小型Imagenet数据集相比，标准化的Swish在RESNET50上的PANILLA SWISH高1.4 \％\％，而CIFAR-100则高出1.2 \％。

In this paper, we investigate the negative effect of activation functions on forward and backward propagation and how to counteract this effect. First, We examine how activation functions affect the forward and backward propagation of neural networks and derive a general form for gradient variance that extends the previous work in this area. We try to use mini-batch statistics to dynamically update the normalization factor to ensure the normalization property throughout the training process, rather than only accounting for the state of the neural network after weight initialization. Second, we propose ANAct, a method that normalizes activation functions to maintain consistent gradient variance across layers and demonstrate its effectiveness through experiments. We observe that the convergence rate is roughly related to the normalization property. We compare ANAct with several common activation functions on CNNs and residual networks and show that ANAct consistently improves their performance. For instance, normalized Swish achieves 1.4\% higher top-1 accuracy than vanilla Swish on ResNet50 with the Tiny ImageNet dataset and more than 1.2\% higher with CIFAR-100.

下载PDF全文

下载文献需遵守相关版权规定

论文标题