论文标题
印地语中的敌对检测数据集
Hostility Detection Dataset in Hindi
论文作者
论文摘要
在本文中,我们以印地语语言介绍了一种新颖的敌对检测数据集。我们收集并手动注释〜8200在线帖子。注释的数据集涵盖了四个敌意维度:虚假新闻,仇恨言论,令人反感和诽谤帖子以及非敌对标签。由于敌对类之间的重叠,还考虑了多标签标签的敌对帖子。我们将此数据集作为约束-2021共享任务的一部分发布。
In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.