Hitachi在Semeval-2020任务12：使用统计抽样和后处理的嘈杂标签的进攻性语言标识

论文标题

Hitachi在Semeval-2020任务12：使用统计抽样和后处理的嘈杂标签的进攻性语言标识

Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing

论文作者

Ravikiran, Manikandan, Muljibhai, Amin Ekant, Miyoshi, Toshinori, Ozaki, Hiroaki, Koreeda, Yuta, Masayuki, Sakata

论文摘要

在本文中，我们介绍了参与Semeval-2020 Task-12子任务-A（英语），该task-a（英语）着重于嘈杂标签的进攻性语言识别。为此，我们开发了一个混合系统，该系统使用BERT分类器进行了使用统计抽样算法（SA）选择的推文训练的BERT分类器，并使用进攻性文字列表进行了后处理（PP）。我们开发的系统在进攻和非犯罪类别的宏观平均F1得分（Macro-F1）中达到了第34个位置。我们进一步展示了全面的结果和错误分析，以帮助未来的研究以嘈杂的标签进行进攻性语言识别。

In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题