Binarybert：推动BERT量化的极限

论文标题

Binarybert：推动BERT量化的极限

BinaryBERT: Pushing the Limit of BERT Quantization

论文作者

Bai, Haoli, Zhang, Wei, Hou, Lu, Shang, Lifeng, Jin, Jing, Jiang, Xin, Liu, Qun, Lyu, Michael, King, Irwin

论文摘要

大型预训练语言模型的快速发展大大增加了对模型压缩技术的需求，其中量化是一种流行的解决方案。在本文中，我们提出了Binarybert，该Binarybert将BERT量化通过重量二进制推向极限。我们发现，由于其复杂且不规则的损失景观，很难直接训练二进制的BERT。因此，我们提出了三元重量分裂，该三元重量通过从半尺寸的三元网络中分裂来初始化Binarybert。因此，二进制模型继承了三元性能的良好性能，并且可以通过在分裂后微调新体系结构来进一步增强。经验结果表明，与完整的型号相比，我们的二进制伯特（Binarybert）的性能略有下降，同时较小24倍，从而在胶水和小队基准上实现了最新的压缩结果。

The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization. We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscape. Therefore, we propose ternary weight splitting, which initializes BinaryBERT by equivalently splitting from a half-sized ternary network. The binary model thus inherits the good performance of the ternary one, and can be further enhanced by fine-tuning the new architecture after splitting. Empirical results show that our BinaryBERT has only a slight performance drop compared with the full-precision model while being 24x smaller, achieving the state-of-the-art compression results on the GLUE and SQuAD benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题