论文标题
时间依赖性,数据流和竞争优势
Time Dependency, Data Flow, and Competitive Advantage
论文作者
论文摘要
数据是基于机器学习的产品和服务的基础,由于其对企业,政府,非营利组织的外部性以及更普遍的社会而被认为是战略性的。众所周知的是,组织(企业,政府机构和计划,甚至行业)的价值随可用数据的数量而言。通常不太理解的是,制作有用的组织预测的数据价值将广泛范围,并且是数据特征和基础算法的重要函数。 在这项研究中,我们的目标是研究数据的价值如何随时间变化以及这种变化在环境和业务领域如何变化(例如,在历史,体育,政治背景下的下一个单词预测)。我们专注于Reddit.com的数据,并比较该值在各种Reddit主题(子reddits)中的时间依赖性。我们通过测量用户生成的文本数据与对话算法预测的相关性的速率来进行比较。我们表明,随着时间的推移,不同的子雷数的相关性率下降不同。 将文本主题与感兴趣的各个业务领域联系起来,我们认为在数据价值衰减的业务领域竞争迅速改变了获得竞争优势的策略。当数据值迅速衰减时,与访问固定数据库存的访问相比,连续数据流的访问将更有价值。在这种环境中,改善用户参与度并增加用户群帮助创建和维持竞争优势。
Data is fundamental to machine learning-based products and services and is considered strategic due to its externalities for businesses, governments, non-profits, and more generally for society. It is renowned that the value of organizations (businesses, government agencies and programs, and even industries) scales with the volume of available data. What is often less appreciated is that the data value in making useful organizational predictions will range widely and is prominently a function of data characteristics and underlying algorithms. In this research, our goal is to study how the value of data changes over time and how this change varies across contexts and business areas (e.g. next word prediction in the context of history, sports, politics). We focus on data from Reddit.com and compare the value's time-dependency across various Reddit topics (Subreddits). We make this comparison by measuring the rate at which user-generated text data loses its relevance to the algorithmic prediction of conversations. We show that different subreddits have different rates of relevance decline over time. Relating the text topics to various business areas of interest, we argue that competing in a business area in which data value decays rapidly alters strategies to acquire competitive advantage. When data value decays rapidly, access to a continuous flow of data will be more valuable than access to a fixed stock of data. In this kind of setting, improving user engagement and increasing user-base help creating and maintaining a competitive advantage.