论文标题

Sandslash:一个两级框架,用于有效的图形图案挖掘

Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

论文作者

Chen, Xuhao, Dathathri, Roshan, Gill, Gurbinder, Hoang, Loc, Pingali, Keshav

论文摘要

图模式挖掘(GPM)用于不同的应用领域,包括社交网络分析,生物信息学和化学工程。现有的GPM框架要么以表达性成本为生产力提供高级界面,要么提供低级接口,可以以增加编程复杂性的成本来表达各种GPM算法。此外,现有系统缺乏探索优化组合的灵活性,以实现具有手工优化应用程序的性能竞争力。 我们提出了SandSlash,这是一种内存图形模式挖掘(GPM)框架,该框架使用新颖的编程接口来支持大图上的生产性,表现性和有效的GPM。 SandSlash提供了一个高级API,仅需要GPM问题的规范,并且实现了快速的子图表,提供有效的数据结构,并自动应用高级优化。为了实现专家优化的实现,SandSlash还提供了低级API,允许用户表达算法特定的优化。这使SandSlash能够在不失去表现力的情况下支持高生产率和高效效率。我们使用五个GPM应用程序和广泛的大型现实世界图评估了共享内存机器上的SandSlash。实验结果表明,使用SandSlash高级或低水平API撰写的应用均优于最先进的GPM Systems Automine,Pangolin和Peregrine,分别以13.8倍,7.9倍和5.4倍的速度分别提高。我们还表明,这些SandSlash应用程序的表现平均超过了专家优化的GPM实现,而编程工作则较小。

Graph pattern mining (GPM) is used in diverse application areas including social network analysis, bioinformatics, and chemical engineering. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a wide variety of GPM algorithms at the cost of increased programming complexity. Moreover, existing systems lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications. We present Sandslash, an in-memory Graph Pattern Mining (GPM) framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that needs only a specification of the GPM problem, and it implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash on shared-memory machines using five GPM applications and a wide range of large real-world graphs. Experimental results demonstrate that applications written using Sandslash high-level or low-level API outperforms state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源