图形结合：RDBMSS的一种新的物理联接算法

论文标题

图形结合：RDBMSS的一种新的物理联接算法

Graphical Join: A New Physical Join Algorithm for RDBMSs

论文作者

Shanghooshabad, Ali Mohammadi, Triantafillou, Peter

论文摘要

众所周知，加入操作（尤其是N-Way，多一对多的加入）是耗时和资源的。在大尺度上，对于表和联接质量尺寸，当前的艺术方法（包括使用嵌套环/哈希/排序 - 分类的二进制结合计划，或者，最糟糕的最佳连接算法（WOJAS）），甚至可能无法产生任何答案，可以给出任何答案，以给出任何答案。在这项工作中，我们介绍了一种新的nwue qui-join处理方法，即图形结合（GJ）。关键想法是两个方面：首先，将物理联接计算问题映射到PGMS并引入调整的推理算法，这些算法可以计算基于运行的长度编码（RLE）基于联接的汇总摘要，从而需要所有必要的统计信息来实现联接结果。其次，也是最重要的是，要证明像GJ这样的JOIN算法（像GJ一样）产生了上述联接介绍摘要，然后对其进行了删除，可以在时间和空间中引入巨大的性能优势。全面的实验是通过工作，TPCD和LASTFM数据集的加入查询进行的，将GJ与PostgreSQL和MonetDB进行了比较，以及UMBRA系统中实现的最先进的WOJA。内存中加入计算的结果表明，性能改善的速度分别比PostgreSQL，MONETDB和UMBRA快64倍，388倍和6倍。对于磁盘上加入计算，GJ的速度比PostgreSQL，MonetDB和Umbra的速度分别高达820x，717X和165X。此外，GJ空间的需求分别高达21,488倍，38,333倍和78,750倍，分别比PostgreSQL，MonetDB和Umbra小。

Join operations (especially n-way, many-to-many joins) are known to be time- and resource-consuming. At large scales, with respect to table and join-result sizes, current state of the art approaches (including both binary-join plans which use Nested-loop/Hash/Sort-merge Join algorithms or, alternatively, worst-case optimal join algorithms (WOJAs)), may even fail to produce any answer given reasonable resource and time constraints. In this work, we introduce a new approach for n-way equi-join processing, the Graphical Join (GJ). The key idea is two-fold: First, to map the physical join computation problem to PGMs and introduce tweaked inference algorithms which can compute a Run-Length Encoding (RLE) based join-result summary, entailing all statistics necessary to materialize the join result. Second, and most importantly, to show that a join algorithm, like GJ, which produces the above join-result summary and then desummarizes it, can introduce large performance benefits in time and space. Comprehensive experimentation is undertaken with join queries from the JOB, TPCDS, and lastFM datasets, comparing GJ against PostgresQL and MonetDB and a state of the art WOJA implemented within the Umbra system. The results for in-memory join computation show performance improvements up to 64X, 388X, and 6X faster than PostgreSQL, MonetDB and Umbra, respectively. For on-disk join computation, GJ is faster than PostgreSQL, MonetDB and Umbra by up to 820X, 717X and 165X, respectively. Furthermore, GJ space needs are up to 21,488X, 38,333X, and 78,750X smaller than PostgresQL, MonetDB, and Umbra, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题