DGL-KE：训练知识图的嵌入

论文标题

DGL-KE：训练知识图的嵌入

DGL-KE: Training Knowledge Graph Embeddings at Scale

论文作者

Zheng, Da, Song, Xiang, Ma, Chao, Tan, Zeyuan, Ye, Zihao, Dong, Jin, Xiong, Hao, Zhang, Zheng, Karypis, George

论文摘要

知识图已成为组织在不同领域中组织信息的关键抽象，并且越来越多地使用它们的嵌入来在各种信息检索和机器学习任务中利用其信息。但是，不断增长的知识图需要计算有效的算法，能够将其扩展到具有数百万节点和数十亿个边缘的图表。本文介绍了DGL-KE，这是一个开源软件包，可有效计算知识图嵌入。 DGL-KE介绍了各种新颖的优化，使用多处理，多GPU和分布式并行性，以数百万节点和数十亿个边缘加速了知识图的培训。这些优化旨在提高数据局部性，减少通信开销，与内存访问重叠并实现高操作效率。在由超过86m节点和338M边缘组成的知识图上进行的实验表明，DGL-KE可以在100分钟内在EC2实例中以8 GPU和30分钟的速度计算嵌入，并在带有48个内核/机器的4个机器上的EC2群集上计算30分钟。这些结果代表了最佳竞争方法的2x〜5倍加速。 DGL-KE可在https://github.com/awslabs/dgl-ke上找到。

Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2x~5x speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.

下载PDF全文

下载文献需遵守相关版权规定

论文标题