通过对比度学习网络二线化

论文标题

通过对比度学习网络二线化

Network Binarization via Contrastive Learning

论文作者

Shang, Yuzhang, Xu, Dan, Zong, Ziliang, Nie, Liqiang, Yan, Yan

论文摘要

神经网络二进制通过将其权重和激活量化为1位来加速深层模型。但是，二进制神经网络（BNNS）与其完整精神（FP）对应物之间仍然存在巨大的性能差距。由于在早期作品中，由于权重二进制引起的量化误差已减少，因此激活二进制成为进一步提高准确性的主要障碍。 BNN表征了独特而有趣的结构，其中二进制和潜在的fp激活存在于同一正向通行证中（即$ \ text {binarize}（\ mathbf {a} _f）= \ mathbf {a a} _b $）。为了减轻从FP到二进制激活的二进化操作引起的信息降解，我们建立了一个新颖的对比学习框架，同时通过互信息（MI）最大化训练BNN。将MI作为指标引入，以衡量二进制和FP激活之间共享的信息，这有助于对比度学习。具体而言，通过从相同输入样品中拉动二进制和FP激活的正对，以及从不同样品中推动负对（负面对数的数量可以呈指数呈较大），从而极大地增强了BNN的表示能力。这使下游任务不仅有益于分类，而且还有益于细分和深度估计等。实验结果表明，我们的方法可以作为现有最新二进制方法的堆积模块实现，还可以显着提高CIFAR-10/100和Imagenet上的性能，此外，除了对Nyud-v2的巨大概括能力外，还可以提高它们的性能。

Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit. However, there is still a huge performance gap between Binary Neural Networks (BNNs) and their full-precision (FP) counterparts. As the quantization error caused by weights binarization has been reduced in earlier works, the activations binarization becomes the major obstacle for further improvement of the accuracy. BNN characterises a unique and interesting structure, where the binary and latent FP activations exist in the same forward pass (i.e., $\text{Binarize}(\mathbf{a}_F) = \mathbf{a}_B$). To mitigate the information degradation caused by the binarization operation from FP to binary activations, we establish a novel contrastive learning framework while training BNNs through the lens of Mutual Information (MI) maximization. MI is introduced as the metric to measure the information shared between binary and FP activations, which assists binarization with contrastive learning. Specifically, the representation ability of the BNNs is greatly strengthened via pulling the positive pairs with binary and FP activations from the same input samples, as well as pushing negative pairs from different samples (the number of negative pairs can be exponentially large). This benefits the downstream tasks, not only classification but also segmentation and depth estimation, etc. The experimental results show that our method can be implemented as a pile-up module on existing state-of-the-art binarization methods and can remarkably improve the performance over them on CIFAR-10/100 and ImageNet, in addition to the great generalization ability on NYUD-v2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题