论文标题
聚类图 - 应用标签传播算法来检测图数据库中的社区
Clustering Graphs -- Applying a Label Propagation Algorithm to Detect Communities in Graph Databases
论文作者
论文摘要
在过去的几十年中,数据库管理系统(DBMS)成为存储大量数据并在其上执行复杂查询的强大工具。近年来,非结构化或半结构化数据的日益增长的数量已经从将关系模型中的数据转变为替代数据模型。图形数据库和图数据库管理系统(GDBMS)由于能够管理高度交互,连续发展的数据而增加了使用的增加。 本论文是使用标签传播社区检测算法在实现系统中识别图形建模数据中簇的工作中所做的工作的文档。该图是使用从dblp.org获得的计算机科学领域的学术出版物数据集构建的。开发的系统是一个Fullstack WebApp,由基于Web的用户界面,API和数据库数据库管理系统(GDBMS)中的数据(节点,边缘,图)组成。 本文档中描述的是: - 在图数据库管理系统(GDBMS)中进行操纵前的进口和导入的过程,例如ArangodB,节点的创建,节点之间的关系(边缘)和由这些节点和边缘组成的图表; - 在nodejs中实现的GraphQL API,从图形数据库管理系统(GDBMS)请求数据; - 由打字稿和反应制成的前端接口,由搜索功能和能够可视化Cytoscape网络图中的结果的能力; - 标签传播社区检测算法在图表上执行,每当要求时,已找到并将其可视化给用户的群集。 该论文希望通过对图表表示,互连数据的集成和分析的实践方法做出贡献。
In the last few decades, Database Management Systems (DBMSs) became powerful tools for storing large amount of data and executing complex queries over them. In the recent years, the growing amount of unstructured or semi-structured data has seen a shift from representing data in the relational model towards alternative data models. Graph Databases and Graph Database Management Systems (GDBMSs) have seen an increase in use due to their ability to manage highly-interconnected, continuously evolving data. This thesis is a documentation of the work done in implementing a system to identify clusters in graph modeled data using a Label Propagation Community Detection Algorithm. The graph was built using datasets of academic publications in the field of Computer Science obtained from dblp.org . The system developed is a FullStack WebApp consisting of a web-based user interface, an API and the data (nodes, edges, graph) stored in a Graph Database Management System (GDBMS). Described in this document are: - the process of manipulation pre-import and import of the data in a Graph Database Management System (GDBMS) such as ArangoDB, creation of nodes, relations (edges) between the nodes and a graph composed of these nodes and edges; - the GraphQL API implemented in NodeJS to request data from the Graph Database Management System (GDBMS); - the frontend interface made with TypeScript and React consisting of the search functionalities and ability to visualize results in Cytoscape Network Graphs; - the Label Propagation Community Detection Algorithm execution on the graph, the found clusters which are stored and visualized to the user whenever requested. This thesis hopes to contribute with a practical hands-on approach on the graph representation, integration and analysis of interconnected data.