为什么仇恨言论？可解释的仇恨言语检测的蒙版理由预测

论文标题

为什么仇恨言论？可解释的仇恨言语检测的蒙版理由预测

Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection

论文作者

Kim, Jiyun, Lee, Byounghan, Sohn, Kyung-Ah

论文摘要

在仇恨言语检测模型中，除了检测性能偏见和解释性外，我们还应考虑两个关键方面。仇恨言论不能仅基于特定词的存在来识别：模型应该能够像人类一样推理并可以解释。为了提高有关这两个方面的绩效，我们建议将蒙版的理由预测（MRP）作为中间任务。 MRP是预测句子的蒙面人类理由 - 束缚的任务，这是人类判断的理由 - 指的是周围的令牌与其未经掩盖的理由相结合。当该模型基于MRP的理由学习其推理能力时，它在偏见和解释性方面牢固地执行仇恨言论检测。提出的方法通常在各种指标中实现最先进的表现，证明了其仇恨言论检测的有效性。

In a hate speech detection model, we should consider two critical aspects in addition to detection performance-bias and explainability. Hate speech cannot be identified based solely on the presence of specific words: the model should be able to reason like humans and be explainable. To improve the performance concerning the two aspects, we propose Masked Rationale Prediction (MRP) as an intermediate task. MRP is a task to predict the masked human rationales-snippets of a sentence that are grounds for human judgment-by referring to surrounding tokens combined with their unmasked rationales. As the model learns its reasoning ability based on rationales by MRP, it performs hate speech detection robustly in terms of bias and explainability. The proposed method generally achieves state-of-the-art performance in various metrics, demonstrating its effectiveness for hate speech detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题