论文标题
使用在线搜索跟踪Covid-19
Tracking COVID-19 using online search
论文作者
论文摘要
先前的研究表明,可以从在线搜索行为中推断出传染病的各种特性。在这项工作中,我们使用时间序列的在线搜索查询频率来获得有关多个国家中Covid-19的普遍性的见解。我们首先根据英国国家卫生服务局和英国公共卫生的相关症状类别开发了无监督的建模技术。然后,我们尝试使用专门针对Covid-19的新闻媒体报道作为代理指标的新闻媒体报道的比例来最大程度地减少由公共利益引起的这些信号的预期偏见(而不是感染)。我们的分析表明,基于在线搜索的模型在报告的确认案件和死亡率分别为16.7(10.2-23.2)和22.1(17.4-26.9)天之前。我们还研究了从疾病传播已经广泛发展到其各自流行曲线阶段的国家的国家的映射监督模型的转移学习技术。此外,我们将在线搜索活动的时间序列与确认的COVID-19案件或共同的死亡进行了比较,发现有趣的查询模式,包括发现稀有症状比常见的症状更好的预测因子。最后,我们表明Web搜索提高了COVID-19死亡的自回归模型的短期预测准确性。我们的工作提供了证据表明,在线搜索数据可用于开发互补的公共卫生监视方法,以帮助与更既定的方法结合使用Covid-19的响应。
Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom's National Health Service and Public Health England. We then attempt to minimise an expected bias in these signals caused by public interest -- as opposed to infections -- using the proportion of news media coverage devoted to COVID-19 as a proxy indicator. Our analysis indicates that models based on online searches precede the reported confirmed cases and deaths by 16.7 (10.2 - 23.2) and 22.1 (17.4 - 26.9) days, respectively. We also investigate transfer learning techniques for mapping supervised models from countries where the spread of disease has progressed extensively to countries that are in earlier phases of their respective epidemic curves. Furthermore, we compare time series of online search activity against confirmed COVID-19 cases or deaths jointly across multiple countries, uncovering interesting querying patterns, including the finding that rarer symptoms are better predictors than common ones. Finally, we show that web searches improve the short-term forecasting accuracy of autoregressive models for COVID-19 deaths. Our work provides evidence that online search data can be used to develop complementary public health surveillance methods to help inform the COVID-19 response in conjunction with more established approaches.