论文标题
用于图形分析的递归SPARQL
Recursive SPARQL for Graph Analytics
论文作者
论文摘要
在知识图和基于图形的数据管理上的工作通常集中在声明的图形查询语言上,或用于图形分析的框架上,在尝试结合这两种方法方面几乎没有工作。但是,从概念上讲,许多现实世界任务都涉及这些方法的组合:可以使用图查询来选择适当的数据,然后将其丰富分析,然后通过查询语言再次过滤或与其他数据进行过滤或混合。在本文中,我们提出了一种声明性的语言,非常适合执行图形查询和分析任务。我们通过提出对SPARQL的简约扩展以表达分析任务来做到这一点;特别是,我们建议使用递归特征扩展SPARQL,并为我们的语言提供正式的语法和语义。我们表明,该语言可以在图形上表达关键的分析任务(实际上,它已经完成),为现有框架和语言提供了更声明的替代方案。我们展示了如何使用专门的客户端在现成的SPARQL引擎上实现我们语言的过程,在内存有限时允许并行处理和基于批处理的处理。结果表明,通过这样的实现,当前在几秒钟或分钟内为选择性子图(我们的目标用例)运行的流行分析程序,但在较大的尺度上挣扎。
Work on knowledge graphs and graph-based data management often focus either on declarative graph query languages or on frameworks for graph analytics, where there has been little work in trying to combine both approaches. However, many real-world tasks conceptually involve combinations of these approaches: a graph query can be used to select the appropriate data, which is then enriched with analytics, and then possibly filtered or combined again with other data by means of a query language. In this paper we propose a declarative language that is well suited to perform graph querying and analytical tasks. We do this by proposing a minimalistic extension of SPARQL to allow for expressing analytical tasks; in particular, we propose to extend SPARQL with recursive features, and provide a formal syntax and semantics for our language. We show that this language can express key analytical tasks on graphs (in fact, it is Turing complete), offering a more declarative alternative to existing frameworks and languages. We show how procedures in our language can be implemented over an off-the-shelf SPARQL engine with a specialised client that allows parallelisation and batch-based processing when memory is limited. Results show that with such an implementation, procedures for popular analytics currently run in seconds or minutes for selective sub-graphs (our target use-case) but struggle at larger scales.