代币：任务分解和知识输液，用于几次仇恨言论检测

论文标题

代币：任务分解和知识输液，用于几次仇恨言论检测

ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

论文作者

AlKhamissi, Badr, Ladhak, Faisal, Iyer, Srini, Stoyanov, Ves, Kozareva, Zornitsa, Li, Xian, Fung, Pascale, Mathias, Lambert, Celikyilmaz, Asli, Diab, Mona

论文摘要

仇恨言论检测很复杂；它依赖于常识性推理，刻板印象的知识以及对从一种文化到另一种文化不同的社会细微差别的理解。收集大规模的仇恨言论注释数据集也很难。在这项工作中，我们将此问题框起来是几次学习任务，并在将任务分解为“组成”部分时显示出很大的收益。此外，我们看到从推理数据集中注入知识（例如Atomic2020）会进一步提高性能。此外，我们观察到，受过训练的模型将其推广到分布外数据集，显示了与先前使用的方法相比，任务分解和知识输注的优越性。具体而言，在16次情况下，我们的方法优于基线的绝对增益17.83％。

Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic2020) improves the performance even further. Moreover, we observe that the trained models generalize to out-of-distribution datasets, showing the superiority of task decomposition and knowledge infusion compared to previously used methods. Concretely, our method outperforms the baseline by 17.83% absolute gain in the 16-shot case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题