这个样本似乎足够好！评估Twitter学术API的覆盖范围和时间可靠性

论文标题

这个样本似乎足够好！评估Twitter学术API的覆盖范围和时间可靠性

This Sample seems to be good enough! Assessing Coverage and Temporal Reliability of Twitter's Academic API

论文作者

Pfeffer, Juergen, Mooseder, Angelina, Lasser, Jana, Hammer, Luca, Stritzel, Oliver, Garcia, David

论文摘要

由于愿意与学术界和行业共享数据，Twitter一直是科学研究的主要社交媒体平台以及过去十年来咨询业务和政府的咨询。近年来，一系列出版物研究并批评了Twitter的API，Twitter部分改编了其现有的数据流。用于学术研究的最新Twitter API允许“访问Twitter的实时和历史公共数据，具有其他功能，可以支持收集更精确，完整和公正的数据集”。此API的主要新功能是可以访问所有历史性推文的完整档案。在本文中，我们将仔细研究学术API，并尝试回答两个问题。首先，与学术API一起收集数据集吗？其次，由于Twitter的学术API在数据收集时提供了Twitter上代表的历史性推文，因此我们需要了解由于推文和从平台上删除帐户而导致的数据随时间丢失了多少数据。我们的工作显示了Twitter的学术API确实可以（几乎）基于各种搜索词创建Twitter数据的完整样本的证据。我们还提供证据表明，Twitter的数据终点V2提供了比以前使用的端点V1.1更好的样本。此外，在研究现象时，请与学术API收集推文，而不是创建本地存储的推文档案，可以直接地遵循Twitter的开发人员协议。最后，我们还将讨论学术API的技术文物和含义。我们希望我们的工作可以增加对Twitter数据收集的另一层理解，从而通过社交媒体数据对人类行为进行了更可靠的研究。

Because of its willingness to share data with academia and industry, Twitter has been the primary social media platform for scientific research as well as for consulting businesses and governments in the last decade. In recent years, a series of publications have studied and criticized Twitter's APIs and Twitter has partially adapted its existing data streams. The newest Twitter API for Academic Research allows to "access Twitter's real-time and historical public data with additional features and functionality that support collecting more precise, complete, and unbiased datasets." The main new feature of this API is the possibility of accessing the full archive of all historic Tweets. In this article, we will take a closer look at the Academic API and will try to answer two questions. First, are the datasets collected with the Academic API complete? Secondly, since Twitter's Academic API delivers historic Tweets as represented on Twitter at the time of data collection, we need to understand how much data is lost over time due to Tweet and account removal from the platform. Our work shows evidence that Twitter's Academic API can indeed create (almost) complete samples of Twitter data based on a wide variety of search terms. We also provide evidence that Twitter's data endpoint v2 delivers better samples than the previously used endpoint v1.1. Furthermore, collecting Tweets with the Academic API at the time of studying a phenomenon rather than creating local archives of stored Tweets, allows for a straightforward way of following Twitter's developer agreement. Finally, we will also discuss technical artifacts and implications of the Academic API. We hope that our work can add another layer of understanding of Twitter data collections leading to more reliable studies of human behavior via social media data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题