论文标题
张量数据平台:迈向以AI为中心的数据库系统
The Tensor Data Platform: Towards an AI-centric Database System
论文作者
论文摘要
数据库引擎历史上已经吸收了数据处理中的许多创新,在处理图数据,XML,面向对象和文本等方面添加了功能。在本文中,我们认为是时候为AI做同样的事情了,但是有一个扭曲!尽管现有的方法试图通过将数据库与外部ML工具集成到本文中,但在本文中,我们声称实现真正以AI为中心的数据库需要将DBMS引擎从关系中移动到张量抽象。这使我们能够:(1)支持多模式数据处理,例如图像,视频,音频,文本以及关系; (2)利用HW和Runtimes的创新及张量计算; (3)利用自动差异化,以启用一类新颖的“可训练”查询,可以学会执行任务。 为了支持上述方案,我们介绍了TDP:一个系统,该系统基于我们先前的工作将关系查询映射到张量。由于与张量运行时的集成更加严格,TDP能够更广泛地覆盖新的新兴方案,需要访问多模式数据和自动差异化。
Database engines have historically absorbed many of the innovations in data processing, adding features to process graph data, XML, object oriented, and text among many others. In this paper, we make the case that it is time to do the same for AI -- but with a twist! While existing approaches have tried to achieve this by integrating databases with external ML tools, in this paper we claim that achieving a truly AI-centric database requires moving the DBMS engine, at its core, from a relational to a tensor abstraction. This allows us to: (1) support multi-modal data processing such as images, videos, audio, text as well as relational; (2) leverage the wellspring of innovation in HW and runtimes for tensor computation; and (3) exploit automatic differentiation to enable a novel class of "trainable" queries that can learn to perform a task. To support the above scenarios, we introduce TDP: a system that builds upon our prior work mapping relational queries to tensors. Thanks to a tighter integration with the tensor runtime, TDP is able to provide a broader coverage of new emerging scenarios requiring access to multi-modal data and automatic differentiation.