论文标题
在Twitter上服务混合云的SQL交互式查询
Serving Hybrid-Cloud SQL Interactive Queries at Twitter
论文作者
论文摘要
在过去几年中,对数据分析的需求一直在不断增加。为了满足要求并提供高度可扩展的查询体验,大规模的内部SQL系统非常依赖。最近,我们将SQL系统演变为Hybrid-Cloud SQL联合系统,符合Twitter的部分多云策略。 Hybrid-Cloud SQL联合系统能够在Twitter的数据中心和公共云上处理查询,每天与大约10%的数据进行交互。 在本文中,介绍了混合云SQL联合系统的设计,其中包括查询,群集和存储联合会。我们确定现代SQL系统中的挑战,并证明我们的系统如何通过一些重要的设计决策来解决它们。我们还进行定性考试,并总结从这种SQL系统的开发和运作中汲取的启发性教训。
The demand for data analytics has been consistently increasing in the past years at Twitter. In order to fulfill the requirements and provide a highly scalable and available query experience, a large-scale in-house SQL system is heavily relied on. Recently, we evolved the SQL system into a hybrid-cloud SQL federation system, compliant with Twitter's Partly Cloudy strategy. The hybrid-cloud SQL federation system is capable of processing queries across Twitter's data centers and the public cloud, interacting with around 10PB of data per day. In this paper, the design of the hybrid-cloud SQL federation system is presented, which consists of query, cluster, and storage federations. We identify challenges in a modern SQL system and demonstrate how our system addresses them with some important design decisions. We also conduct qualitative examinations and summarize instructive lessons learned from the development and operation of such a SQL system.