论文标题

不要光顾我!带有光顾和屈服语言对弱势社区的注释数据集

Don't Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities

论文作者

Pérez-Almendros, Carla, Espinosa-Anke, Luis, Schockaert, Steven

论文摘要

在本文中,我们介绍了一个新的注释数据集,该数据集旨在支持NLP模型的开发,以识别和分类对弱势社区(例如难民,无家可归者,贫困家庭)的语言或屈从于弱势社区。尽管长期以来一直证明,这种语言在通用媒体中的流行率与其他类型的有害语言有所不同,但它通常不自觉地使用并且具有良好的意图。我们此外,我们认为,光顾和屈尊语言(PCL)通常对NLP社区提出了一个有趣的技术挑战。我们对拟议数据集的分析表明,对于标准NLP模型而言,识别PCL很难,诸如BERT之类的语言模型可以实现最佳结果。

In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e.g. refugees, homeless people, poor families). While the prevalence of such language in the general media has long been shown to have harmful effects, it differs from other types of harmful language, in that it is generally used unconsciously and with good intentions. We furthermore believe that the often subtle nature of patronizing and condescending language (PCL) presents an interesting technical challenge for the NLP community. Our analysis of the proposed dataset shows that identifying PCL is hard for standard NLP models, with language models such as BERT achieving the best results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源