变压器推理的零射击动态量化

论文标题

变压器推理的零射击动态量化

Zero-Shot Dynamic Quantization for Transformer Inference

论文作者

El-Kurdi, Yousef, Quinn, Jerry, Sil, Avirup

论文摘要

我们介绍了一种新型的运行时间方法，可显着将与BERT样模型量化为8位整数相关的准确性损失。量化模型的现有方法要么修改训练步骤，要么需要额外的校准步骤来调整也需要选定的固定数据集的参数。我们的方法允许利用量化，而无需这些调整。我们介绍了几个NLP任务的结果，这些任务证明了该技术的有用性。

We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an additional calibration step to adjust parameters that also requires a selected held-out dataset. Our method permits taking advantage of quantization without the need for these adjustments. We present results on several NLP tasks demonstrating the usefulness of this technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题