论文标题
变压器推理的零射击动态量化
Zero-Shot Dynamic Quantization for Transformer Inference
论文作者
论文摘要
我们介绍了一种新型的运行时间方法,可显着将与BERT样模型量化为8位整数相关的准确性损失。量化模型的现有方法要么修改训练步骤,要么需要额外的校准步骤来调整也需要选定的固定数据集的参数。我们的方法允许利用量化,而无需这些调整。我们介绍了几个NLP任务的结果,这些任务证明了该技术的有用性。
We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an additional calibration step to adjust parameters that also requires a selected held-out dataset. Our method permits taking advantage of quantization without the need for these adjustments. We present results on several NLP tasks demonstrating the usefulness of this technique.