在关系数据库中的嵌入

论文标题

在关系数据库中的嵌入

On Embeddings in Relational Databases

论文作者

Arora, Siddhant, Bedathur, Srikanta

论文摘要

我们解决了使用低维嵌入在关系数据库中学习实体的分布式表示的问题。低维嵌入旨在封装一个基础数据集的简洁矢量表示，并且信息损失最小。由于涉及复杂的数据关系和表示复杂性，关系数据库中跨实体的嵌入程度较低。关系数据库是一个相互结合的关系集合，不仅对实体之间的关系建模，而且还记录了定义实体之间复杂关系的数据的复杂域特异性定量和时间属性。学习嵌入的最新方法构成了一种天真的方法，即通过实现所有表的完整连接并将其表示为知识图，以考虑对数据库的完全构规化。这种流行的方法具有一定的局限性，因为它无法捕获关系数据库中编码的排成间关系和其他语义。在本文中，我们证明了；通过使用关系和潜在的行间关系，通过利用表中列的基本语义来利用列的基本语义来学习表示表示的更好的方法。对现实世界数据库的经验结果，并对相似性和表完整任务进行评估支持我们的主张。

We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding. Low-dimensional embeddings aim to encapsulate a concise vector representation for an underlying dataset with minimum loss of information. Embeddings across entities in a relational database have been less explored due to the intricate data relations and representation complexity involved. Relational databases are an inter-weaved collection of relations that not only model relationships between entities but also record complex domain-specific quantitative and temporal attributes of data defining complex relationships among entities. Recent methods for learning an embedding constitute of a naive approach to consider complete denormalization of the database by materializing the full join of all tables and representing as a knowledge graph. This popular approach has certain limitations as it fails to capture the inter-row relationships and additional semantics encoded in the relational databases. In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships. Empirical results over a real-world database with evaluations on similarity join and table completion tasks support our proposition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题