加载所需的内容：较小版本的多语言bert

论文标题

加载所需的内容：较小版本的多语言bert

Load What You Need: Smaller Versions of Multilingual BERT

论文作者

Abdaoui, Amine, Pradel, Camille, Sigel, Grégoire

论文摘要

基于预训练的变压器模型正在为各种自然语言处理数据集取得最新的结果。但是，这些模型的大小通常是它们在实际生产应用程序中部署的缺点。在多语言模型的情况下，大多数参数位于嵌入层中。因此，减少词汇大小应该对参数总数产生重要影响。在本文中，我们建议生成较小的模型，这些模型根据目标语料库来处理较少数量的语言。我们对XNLI数据集的多语言BERT的较小版本进行了评估，但我们认为该方法可以应用于其他多语言变压器。获得的结果证实，我们可以生成较小的模型，以保持可比较的结果，同时最多将参数总数的45％降低。我们将我们的模型与Distilmbert（一种蒸馏版的多语言BERT）进行了比较，并表明与降低语言不同，蒸馏会导致XNLI数据集的总体准确性下降1.7％至6％。提出的模型和代码公开可用。

Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题