动态精制的正则化，以改善跨科技仇恨言语检测

论文标题

动态精制的正则化，以改善跨科技仇恨言语检测

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

论文作者

Bose, Tulika, Aletras, Nikolaos, Illina, Irina, Fohr, Dominique

论文摘要

当在与源不同的数据集上评估时，仇恨言语分类器会表现出很大的性能降解。这是由于学习不一定与仇恨语言相关的单词之间的虚假相关性以及培训语料库的仇恨言论标签。以前的工作试图通过将预定义静态词典的特定术语定向来缓解此问题。尽管已证明这是为了提高分类器的普遍性，但这种方法的覆盖范围有限，词典需要定期从人类专家那里进行手动更新。在本文中，我们建议使用归因方法自动识别和减少虚假相关性，并动态地完善培训期间需要正规化的术语列表。我们的方法是灵活的，可以独立地和预定词典结合使用，改善了先前工作的跨界表现。

Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source. This is due to learning spurious correlations between words that are not necessarily relevant to hateful language, and hate speech labels from the training corpus. Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictionaries. While this has been demonstrated to improve the generalizability of classifiers, the coverage of such methods is limited and the dictionaries require regular manual updates from human experts. In this paper, we propose to automatically identify and reduce spurious correlations using attribution methods with dynamic refinement of the list of terms that need to be regularized during training. Our approach is flexible and improves the cross-corpora performance over previous work independently and in combination with pre-defined dictionaries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题