论文标题
德国的下一个语言模型
German's Next Language Model
论文作者
论文摘要
在这项工作中,我们介绍了导致基于Bert和Electra的德语模型Gbert和Gelectra的实验。通过改变输入训练数据,模型大小以及整个单词掩蔽(WWM)的存在,我们能够在一组文档分类中达到SOTA性能,并为基本和大尺寸模型的命名实体识别(NER)任务(NER)任务。我们在训练这些模型中采用了一种评估驱动的方法,结果表明,添加更多数据又可以利用WWM提高模型性能。通过对现有德国模型进行基准测试,我们表明这些模型是迄今为止最好的德国模型。我们的训练有素的模型将公开向研究社区公开使用。
In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. Our trained models will be made publicly available to the research community.