论文标题

Binarybert:推动BERT量化的极限

BinaryBERT: Pushing the Limit of BERT Quantization

论文作者

Bai, Haoli, Zhang, Wei, Hou, Lu, Shang, Lifeng, Jin, Jing, Jiang, Xin, Liu, Qun, Lyu, Michael, King, Irwin

论文摘要

大型预训练语言模型的快速发展大大增加了对模型压缩技术的需求,其中量化是一种流行的解决方案。在本文中,我们提出了Binarybert,该Binarybert将BERT量化通过重量二进制推向极限。我们发现,由于其复杂且不规则的损失景观,很难直接训练二进制的BERT。因此,我们提出了三元重量分裂,该三元重量通过从半尺寸的三元网络中分裂来初始化Binarybert。因此,二进制模型继承了三元性能的良好性能,并且可以通过在分裂后微调新体系结构来进一步增强。经验结果表明,与完整的型号相比,我们的二进制伯特(Binarybert)的性能略有下降,同时较小24倍,从而在胶水和小队基准上实现了最新的压缩结果。

The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization. We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscape. Therefore, we propose ternary weight splitting, which initializes BinaryBERT by equivalently splitting from a half-sized ternary network. The binary model thus inherits the good performance of the ternary one, and can be further enhanced by fine-tuning the new architecture after splitting. Empirical results show that our BinaryBERT has only a slight performance drop compared with the full-precision model while being 24x smaller, achieving the state-of-the-art compression results on the GLUE and SQuAD benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源