印地语中的敌对检测数据集

论文标题

印地语中的敌对检测数据集

Hostility Detection Dataset in Hindi

论文作者

Bhardwaj, Mohit, Akhtar, Md Shad, Ekbal, Asif, Das, Amitava, Chakraborty, Tanmoy

论文摘要

在本文中，我们以印地语语言介绍了一种新颖的敌对检测数据集。我们收集并手动注释〜8200在线帖子。注释的数据集涵盖了四个敌意维度：虚假新闻，仇恨言论，令人反感和诽谤帖子以及非敌对标签。由于敌对类之间的重叠，还考虑了多标签标签的敌对帖子。我们将此数据集作为约束-2021共享任务的一部分发布。

In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题