更少的是：Slimg以获得准确，健壮和可解释的图挖掘

论文标题

更少的是：Slimg以获得准确，健壮和可解释的图挖掘

Less is More: SlimG for Accurate, Robust, and Interpretable Graph Mining

论文作者

Yoo, Jaemin, Lee, Meng-Chieh, Shekhar, Shubhranshu, Faloutsos, Christos

论文摘要

我们如何在可能具有嘈杂特征和结构的各种图中求解半监督的节点分类？图形神经网络（GNN）在许多图挖掘任务中都取得了成功，但是由于训练，高参数调谐和选择模型本身的难度，它们对各种图形方案的推广性受到限制。爱因斯坦说，我们应该“使一切尽可能简单，但不是更简单”。我们将其重新为谨慎的简单原则：精心设计的简单模型可以在现实图表中超越复杂的模型。根据原理，我们提出了SLIMG，以进行半监督节点分类，该分类表现出四个理想的属性：（a）在13个现实世界中的13个现实世界中的10个精确，胜利或绑扎；（b）稳健，是唯一处理图形数据的所有方案（同质，异，随机结构，嘈杂的特征等）；（c）快速可扩展的，在百万尺度图中显示高达18倍的训练；（d）可解释，这要归功于线性和稀疏性。我们通过对现有GNN，理智检查和全面消融研究的设计进行系统的研究来解释Slimg的成功。

How can we solve semi-supervised node classification in various graphs possibly with noisy features and structures? Graph neural networks (GNNs) have succeeded in many graph mining tasks, but their generalizability to various graph scenarios is limited due to the difficulty of training, hyperparameter tuning, and the selection of a model itself. Einstein said that we should "make everything as simple as possible, but not simpler." We rephrase it into the careful simplicity principle: a carefully-designed simple model can surpass sophisticated ones in real-world graphs. Based on the principle, we propose SlimG for semi-supervised node classification, which exhibits four desirable properties: It is (a) accurate, winning or tying on 10 out of 13 real-world datasets; (b) robust, being the only one that handles all scenarios of graph data (homophily, heterophily, random structure, noisy features, etc.); (c) fast and scalable, showing up to 18 times faster training in million-scale graphs; and (d) interpretable, thanks to the linearity and sparsity. We explain the success of SlimG through a systematic study of the designs of existing GNNs, sanity checks, and comprehensive ablation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题