论文标题
卡门室:级联助理介导的多语言伯特
CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT
论文作者
论文摘要
具有数亿甚至数十亿个参数的大型语言模型在各种自然语言处理(NLP)任务上表现出色。然而,由于缺乏足够大的计算资源的可用性和可移植性,它们的广泛使用和采用受到了阻碍。本文提出了关于Lightmbert的工作的知识蒸馏(KD)技术,该技术是多语言Bert(Mbert)的学生模型。通过越来越多地压缩的Toplayer蒸馏教师助理网络,通过反复将Mbert蒸馏出来,Camembert旨在改善Mbert的时间和空间复杂性,同时将准确性丧失在可接受的阈值之下。目前,卡梅蒙德的平均准确度约为60.1%,这在对微调中使用的超参数的未来改进后可能会发生变化。
Large language models having hundreds of millions, and even billions, of parameters have performed extremely well on a variety of natural language processing (NLP) tasks. Their widespread use and adoption, however, is hindered by the lack of availability and portability of sufficiently large computational resources. This paper proposes a knowledge distillation (KD) technique building on the work of LightMBERT, a student model of multilingual BERT (mBERT). By repeatedly distilling mBERT through increasingly compressed toplayer distilled teacher assistant networks, CAMeMBERT aims to improve upon the time and space complexities of mBERT while keeping loss of accuracy beneath an acceptable threshold. At present, CAMeMBERT has an average accuracy of around 60.1%, which is subject to change after future improvements to the hyperparameters used in fine-tuning.