论文标题
Hitachi在Semeval-2020任务12:使用统计抽样和后处理的嘈杂标签的进攻性语言标识
Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing
论文作者
论文摘要
在本文中,我们介绍了参与Semeval-2020 Task-12子任务-A(英语),该task-a(英语)着重于嘈杂标签的进攻性语言识别。为此,我们开发了一个混合系统,该系统使用BERT分类器进行了使用统计抽样算法(SA)选择的推文训练的BERT分类器,并使用进攻性文字列表进行了后处理(PP)。我们开发的系统在进攻和非犯罪类别的宏观平均F1得分(Macro-F1)中达到了第34个位置。我们进一步展示了全面的结果和错误分析,以帮助未来的研究以嘈杂的标签进行进攻性语言识别。
In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.