论文标题

包容性:基准和包括性别的德语的模型

INCLUSIFY: A benchmark and a model for gender-inclusive German

论文作者

Pomerenke, David

论文摘要

包括性别的语言对于在具有性别变化的语言中实现性别平等很重要,例如德语。在激发一些争议的同时,公司和政治机构越来越多地采用。已经开发了少数工具来帮助人们通过识别通用男性的实例并为更具包容性的重新恢复提供建议,以帮助人们使用包含性别的语言。在本报告中,我们根据自然语言处理定义了基本任务,并提出了对其进行基准测试的数据集和措施。我们还提出了一个模型,该模型通过将包容性语言数据库与通过标准预训练的模型相结合的详细处理步骤来实现这些任务。我们的模型在我们的基准中获得了0.89的召回率,精度为0.82,用于识别独家语言;在44%的情况下,在现实世界中选择了它的前五名建议之一。我们通过训练端到端模型和使用大型语言模型来勾勒出如何进一步提高该区域的;而且,我们敦促社区在其培训数据中包括更多的性别包含性别的文本,以便不带来采用包括性别的语言的障碍。通过这些努力,我们希望为恢复语言的正义做出贡献,并且在很小的程度上,实际上。

Gender-inclusive language is important for achieving gender equality in languages with gender inflections, such as German. While stirring some controversy, it is increasingly adopted by companies and political institutions. A handful of tools have been developed to help people use gender-inclusive language by identifying instances of the generic masculine and providing suggestions for more inclusive reformulations. In this report, we define the underlying tasks in terms of natural language processing, and present a dataset and measures for benchmarking them. We also present a model that implements these tasks, by combining an inclusive language database with an elaborate sequence of processing steps via standard pre-trained models. Our model achieves a recall of 0.89 and a precision of 0.82 in our benchmark for identifying exclusive language; and one of its top five suggestions is chosen in real-world texts in 44% of cases. We sketch how the area could be further advanced by training end-to-end models and using large language models; and we urge the community to include more gender-inclusive texts in their training data in order to not present an obstacle to the adoption of gender-inclusive language. Through these efforts, we hope to contribute to restoring justice in language and, to a small extent, in reality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源