论文标题
自主图挖掘算法搜索以最佳的速度/准确性权衡取舍
Autonomous Graph Mining Algorithm Search with Best Speed/Accuracy Trade-off
论文作者
论文摘要
从社交网络到生物信息学的学术界和行业中,图形数据无处不在。如今的图表的普遍性提高了对算法的需求,这些算法可以回答各种问题:用户可以购买哪些产品给定订单清单?哪些用户正在购买伪造的关注者以提高公众声誉?每年都会提出无数新的图挖掘算法来回答此类问题 - 每个问题都有明显的问题表述,计算时间和记忆足迹。这种缺乏统一使从业者很难比较不同的算法并选择最适合特定应用程序的算法。这些挑战(对于非专家来说更加严重)造成了一个差距,在该差距中,在学术环境中开发的最新技术无法最佳地部署在现实世界中。为了弥合这一差距,我们提出了AutoGM,这是一种用于图挖掘算法开发的自动化系统。我们首先定义了一个统一的框架统一,该统一统一,该框架集成了各种基于消息的图形算法,从诸如Pagerank之类的常规算法到图形神经网络等传统算法。然后UnifiedGM定义了一个搜索空间,其中需要五个参数来确定图形算法。在此搜索空间下,AutoGM使用贝叶斯优化明确优化了UnifiedGM的最佳参数集。 AutoGM定义了一种新型的预算意识目标功能,以优化将实际问题 - 在计算预算下找到最佳的速度准确性权衡 - 纳入图算法生成问题。现实世界中基准数据集的实验表明,与现有具有启发式参数的现有模型相比,AutoGM以最佳的速度/精度折衷生成新颖的图形挖掘算法。
Graph data is ubiquitous in academia and industry, from social networks to bioinformatics. The pervasiveness of graphs today has raised the demand for algorithms that can answer various questions: Which products would a user like to purchase given her order list? Which users are buying fake followers to increase their public reputation? Myriads of new graph mining algorithms are proposed every year to answer such questions - each with a distinct problem formulation, computational time, and memory footprint. This lack of unity makes it difficult for a practitioner to compare different algorithms and pick the most suitable one for a specific application. These challenges - even more severe for non-experts - create a gap in which state-of-the-art techniques developed in academic settings fail to be optimally deployed in real-world applications. To bridge this gap, we propose AUTOGM, an automated system for graph mining algorithm development. We first define a unified framework UNIFIEDGM that integrates various message-passing based graph algorithms, ranging from conventional algorithms like PageRank to graph neural networks. Then UNIFIEDGM defines a search space in which five parameters are required to determine a graph algorithm. Under this search space, AUTOGM explicitly optimizes for the optimal parameter set of UNIFIEDGM using Bayesian Optimization. AUTOGM defines a novel budget-aware objective function for the optimization to incorporate a practical issue - finding the best speed-accuracy trade-off under a computation budget - into the graph algorithm generation problem. Experiments on real-world benchmark datasets demonstrate that AUTOGM generates novel graph mining algorithms with the best speed/accuracy trade-off compared to existing models with heuristic parameters.