变压器网络对图的概括

论文标题

变压器网络对图的概括

A Generalization of Transformer Networks to Graphs

论文作者

Dwivedi, Vijay Prakash, Bresson, Xavier

论文摘要

我们提出了针对任意图的变压器神经网络体系结构的概括。原始的变压器是为自然语言处理（NLP）设计的，该加工在代表序列中单词之间的所有连接的完全连接图上运行。这种体系结构不会利用图形连接诱导偏置，并且当图形拓扑很重要并且尚未编码为节点特征时的性能差。与标准模型相比，我们引入了具有四个新属性的图形变压器。首先，注意机制是图中每个节点的邻域连接性的函数。其次，位置编码由Laplacian特征向量表示，它们自然地概括了正弦曲线位置编码，通常在NLP中使用。第三，将层归一化替换为批处理层，该层提供了更快的训练和更好的概括性能。最后，架构扩展到边缘功能表示，这对于任务S.A.至关重要。化学（债券类型）或链接预测（知识图中的实体关系）。图基准上的数值实验证明了所提出的图形变压器体系结构的性能。这项工作缩小了原始变压器之间的差距，该原始变压器是为有限的线图和图形神经网络而设计的，该差距可以与任意图一起使用。由于我们的体系结构简单而通用，因此我们认为它可以用作希望考虑变压器和图形的将来应用程序的黑匣子。

We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which provides faster training and better generalization performance. Finally, the architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs). Numerical experiments on a graph benchmark demonstrate the performance of the proposed graph transformer architecture. This work closes the gap between the original transformer, which was designed for the limited case of line graphs, and graph neural networks, that can work with arbitrary graphs. As our architecture is simple and generic, we believe it can be used as a black box for future applications that wish to consider transformer and graphs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题