论文标题

Naist Covid:多语言Covid-19 Twitter和Weibo数据集

NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset

论文作者

Gao, Zhiwei, Yada, Shuntaro, Wakamiya, Shoko, Aramaki, Eiji

论文摘要

自2019年冠状病毒疾病爆发以来,2019年底,它影响了全球200多个国家和数十亿人。这影响了由于执行的执行,例如“社会疏远”和“留在家里”。这导致通过社交媒体的互动越来越大。鉴于社交媒体可以在全球范围内为我们带来有关Covid-19的宝贵信息,因此分享数据并鼓励社交媒体研究对COVID-19或其他传染病很重要。因此,我们发布了与Covid-19相关的社交媒体帖子的多语言数据集,其中包括来自Twitter的英语和日语的微博以及来自中文的微博。数据涵盖了从2020年1月20日至2020年3月24日的微博。本文还通过创建每日单词云作为文本挖掘分析的示例,提供了对这些数据集的定量和定性分析。该数据集现在可以在GitHub上获得。可以通过多种方式对该数据集进行分析,并有望帮助有效地沟通与Covid-19相关的预防措施。

Since the outbreak of coronavirus disease 2019 (COVID-19) in the late 2019, it has affected over 200 countries and billions of people worldwide. This has affected the social life of people owing to enforcements, such as "social distancing" and "stay at home." This has resulted in an increasing interaction through social media. Given that social media can bring us valuable information about COVID-19 at a global scale, it is important to share the data and encourage social media studies against COVID-19 or other infectious diseases. Therefore, we have released a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo. The data cover microblogs from January 20, 2020, to March 24, 2020. This paper also provides a quantitative as well as qualitative analysis of these datasets by creating daily word clouds as an example of text-mining analysis. The dataset is now available on Github. This dataset can be analyzed in a multitude of ways and is expected to help in efficient communication of precautions related to COVID-19.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源