ARCOV19-RUMORS：阿拉伯共同COVID-19 Twitter数据集用于错误信息检测

论文标题

ARCOV19-RUMORS：阿拉伯共同COVID-19 Twitter数据集用于错误信息检测

ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection

论文作者

Haouari, Fatima, Hasanain, Maram, Suwaileh, Reem, Elsayed, Tamer

论文摘要

在本文中，我们介绍了Arcov19-Rumors，这是一款阿拉伯语Covid-19 Twitter数据集，用于错误信息检测，该数据集由1月27日至2020年4月底的推文组成，该推文组成。我们收集了138项经过验证的索赔，主要是从流行的事实检查网站，并确定了9.4k相关的相关推文给这些索赔。推文是通过真实性手动通知的，以支持有关错误信息检测的研究，这是大流行期间面临的主要问题之一。 Arcov19-Rumors通过Twitter支持两个级别的错误信息检测：验证自由文本索赔（称为索赔级验证），并验证在Tweets中表达的索赔（称为Tweet级验证）。我们的数据集除了健康外，还涵盖了与其他受主题类别相关的主张，这些主张受到了Covid-19的影响，即社会，政治，体育，娱乐和宗教。此外，我们为数据集上的推文级验证提供了基准测试结果。我们试验了多功能方法的SOTA模型，这些模型要么利用内容，用户配置文件功能，时间特征和对话线程的传播结构进行推文验证。

In this paper we introduce ArCOV19-Rumors, an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020. We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims. Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic. ArCOV19-Rumors supports two levels of misinformation detection over Twitter: verifying free-text claims (called claim-level verification) and verifying claims expressed in tweets (called tweet-level verification). Our dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious. Moreover, we present benchmarking results for tweet-level verification on the dataset. We experimented with SOTA models of versatile approaches that either exploit content, user profiles features, temporal features and propagation structure of the conversational threads for tweet verification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题