E2EG：使用图形拓扑和基于文本的节点属性的端到端节点分类

论文标题

E2EG：使用图形拓扑和基于文本的节点属性的端到端节点分类

E2EG: End-to-End Node Classification Using Graph Topology and Text-based Node Attributes

论文作者

Dinh, Tu Anh, Boef, Jeroen den, Cornelisse, Joran, Groth, Paul

论文摘要

利用基于文本的节点属性的节点分类具有许多真实的应用程序，从对学术引用图中的纸张主题的预测到社交媒体网络中用户特征的分类。最新的节点分类框架（例如Giant）使用两个阶段管道：首先嵌入图节点的文本属性，然后将所得嵌入的嵌入到节点分类模型中。在本文中，我们消除了这两个阶段，并开发了建立在巨人基于端到端巨型（E2EG）的端到端节点分类模型。在我们的方法中，主体和辅助分类目标的串联利用导致了更强大的模型，从而使BERT主链可以切换为蒸馏编码器，其参数数量减少了25％ - 40％。此外，该模型的端到端性质提高了易用性，因为它避免了链接多个模型进行节点分类的需求。与OGBN-ARXIV和OGBN产品数据集的巨型+MLP基线相比，E2EG在偏置设置（+0.5％）中获得的精度稍好一些，而将模型训练时间降低了40％。我们的模型也适用于电感设置，优于巨型 +MLP高达 +2.23％。

Node classification utilizing text-based node attributes has many real-world applications, ranging from prediction of paper topics in academic citation graphs to classification of user characteristics in social media networks. State-of-the-art node classification frameworks, such as GIANT, use a two-stage pipeline: first embedding the text attributes of graph nodes then feeding the resulting embeddings into a node classification model. In this paper, we eliminate these two stages and develop an end-to-end node classification model that builds upon GIANT, called End-to-End-GIANT (E2EG). The tandem utilization of a main and an auxiliary classification objectives in our approach results in a more robust model, enabling the BERT backbone to be switched out for a distilled encoder with a 25% - 40% reduction in the number of parameters. Moreover, the model's end-to-end nature increases ease of use, as it avoids the need of chaining multiple models for node classification. Compared to a GIANT+MLP baseline on the ogbn-arxiv and ogbn-products datasets, E2EG obtains slightly better accuracy in the transductive setting (+0.5%), while reducing model training time by up to 40%. Our model is also applicable in the inductive setting, outperforming GIANT+MLP by up to +2.23%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题