论文标题

2021年使用监督的机器学习和功能组合的2021乌尔都语假新闻检测任务

The 2021 Urdu Fake News Detection Task using Supervised Machine Learning and Feature Combinations

论文作者

Humayoun, Muhammad

论文摘要

本文介绍了在火灾共享任务中提交的系统说明:“ 2021个乌尔都语的假新闻检测”。这项挑战旨在自动确定乌尔都语中写的假新闻。我们提交的结果在比赛中排名第五。但是,在竞争的结果声明之后,我们设法取得了比提交结果更好的结果。我们的模型之一获得的最佳F1宏分数为0.6674,高于比赛中第二好的分数。在支撑矢量机(多项式内核1)上取出止血,并选择了15.57亿个最佳功能(由155.7万个最佳功能)(由单词n n-grams n = 1,2,3,4和char n-grams n = 2,3,4,5,5,66)。该代码可用于可重复性。

This paper presents the system description submitted at the FIRE Shared Task: "The 2021 Fake News Detection in the Urdu Language". This challenge aims at automatically identifying Fake news written in Urdu. Our submitted results ranked fifth in the competition. However, after the result declaration of the competition, we managed to attain even better results than the submitted results. The best F1 Macro score achieved by one of our models is 0.6674, higher than the second-best score in the competition. The result is achieved on Support Vector Machines (polynomial kernel degree 1) with stopwords removed, lemmatization applied, and selecting the 20K best features out of 1.557 million features in total (which were produced by Word n-grams n=1,2,3,4 and Char n-grams n=2,3,4,5,6). The code is made available for reproducibility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源