跨语性的感应转移以检测令人反感的语言

论文标题

跨语性的感应转移以检测令人反感的语言

Cross-lingual Inductive Transfer to Detect Offensive Language

论文作者

Pant, Kartikey, Dadu, Tanvi

论文摘要

随着社交媒体的日益增长的使用及其可用性，在多种语言和领域中都观察到了使用进攻性语言的许多实例。这种现象已经导致了越来越多地检测社交媒体上使用的进攻性语言的需求。在2020年进攻中，组织者发布了\ textit {多语言攻击语言标识数据集}（Molid），其中包含五种不同语言的推文，以检测进攻性语言。在这项工作中，我们介绍了一种跨语性的归纳方法，以使用上下文单词嵌入\ textit {xlm-roberta}（xlm-r）来识别推文中的进攻语言。我们表明，我们的模型在所有五种语言上都具有竞争力，以0.919美元的F1分数获得英语任务中的第四个职位，而在土耳其任务中的第八个位置，F1分数为0.781美元。进一步的实验证明，我们的模型在零拍的学习环境中有竞争力，并且对其他语言是可扩展的。

With the growing use of social media and its availability, many instances of the use of offensive language have been observed across multiple languages and domains. This phenomenon has given rise to the growing need to detect the offensive language used in social media cross-lingually. In OffensEval 2020, the organizers have released the \textit{multilingual Offensive Language Identification Dataset} (mOLID), which contains tweets in five different languages, to detect offensive language. In this work, we introduce a cross-lingual inductive approach to identify the offensive language in tweets using the contextual word embedding \textit{XLM-RoBERTa} (XLM-R). We show that our model performs competitively on all five languages, obtaining the fourth position in the English task with an F1-score of $0.919$ and eighth position in the Turkish task with an F1-score of $0.781$. Further experimentation proves that our model works competitively in a zero-shot learning environment, and is extensible to other languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题