适应性多邻居注意力的图表表示学习

论文标题

适应性多邻居注意力的图表表示学习

Adaptive Multi-Neighborhood Attention based Transformer for Graph Representation Learning

论文作者

Li, Gaichao, Chen, Jinsong, He, Kun

论文摘要

通过将图形结构信息纳入变压器，图形变压器近年来表现出有希望的图形表示性能。现有的图形变压器利用特定的策略，例如laplacian特征向量和节点对的最短路径，以保留节点的结构特征，并将其馈入Vanilla Transformer中，以学习节点的表示。此类预定义的规则很难为任意图的拓扑结构变化很大的任意图提取信息图的结构特征，从而限制了模型的学习能力。为此，我们提出了一个自适应图形变压器，称为基于多个纽伯希注意的图形变压器（MNA-GT），该图可自适应地捕获来自多个纽伯格期的每个节点的图形结构信息。通过将输入定义为将缩放点产品作为注意内核，MNA-GT基于不同邻域的啤酒花构建多个注意力内核，以便每个注意力内核可以捕获每个节点对的相应邻域的特定图形结构信息。通过这种方式，MNA-GT可以通过合并不同注意内核学习的节点表示来有效地保留图形结构信息。 MNA-GT进一步采用了注意力层来了解不同注意力内核的重要性，使该模型能够自适应地捕获不同节点的图形结构信息。广泛的实验是在各种图基准上进行的，经验结果表明，MNA-GT的表现优于许多强基础。

By incorporating the graph structural information into Transformers, graph Transformers have exhibited promising performance for graph representation learning in recent years. Existing graph Transformers leverage specific strategies, such as Laplacian eigenvectors and shortest paths of the node pairs, to preserve the structural features of nodes and feed them into the vanilla Transformer to learn the representations of nodes. It is hard for such predefined rules to extract informative graph structural features for arbitrary graphs whose topology structure varies greatly, limiting the learning capacity of the models. To this end, we propose an adaptive graph Transformer, termed Multi-Neighborhood Attention based Graph Transformer (MNA-GT), which captures the graph structural information for each node from the multi-neighborhood attention mechanism adaptively. By defining the input to perform scaled-dot product as an attention kernel, MNA-GT constructs multiple attention kernels based on different hops of neighborhoods such that each attention kernel can capture specific graph structural information of the corresponding neighborhood for each node pair. In this way, MNA-GT can preserve the graph structural information efficiently by incorporating node representations learned by different attention kernels. MNA-GT further employs an attention layer to learn the importance of different attention kernels to enable the model to adaptively capture the graph structural information for different nodes. Extensive experiments are conducted on a variety of graph benchmarks, and the empirical results show that MNA-GT outperforms many strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题