论文标题
顺序分层再生:大型状态空间的MCMC,并应用了子图计数估计
Sequential Stratified Regeneration: MCMC for Large State Spaces with an Application to Subgraph Count Estimation
论文作者
论文摘要
这项工作考虑了在图表的边缘上估算有界函数的总和,给定的邻里查询访问以及整个网络访问的何处的一般任务。为了估计这一总和,先前的工作提出了使用随机步行的Markov Chain Monte Carlo(MCMC)方法,该方法在某个种子顶点开始,其平衡分布是所有边缘上均匀分布的分布,从而消除了在所有边缘上迭代的需求。不幸的是,这些现有的估计器无法扩展到大量的现实图形。在本文中,我们介绍了基于MCMC的估计器Ripple,该估计器通过将Markov链状态空间分层为有序的地层来实现前所未有的可伸缩性,我们用新技术表示{\ em em em equintial分层再生}。我们表明,波纹估计器是一致的,高度可行的,并且尺度很好。 我们通过将波纹应用于估计连接的,引起的子图计数的任务来评估我们的方法。在其中,我们证明了波纹是准确的,并且可以估计数量高达$ 12 $节点子图,这是一项规模上的任务,不仅是基于MCMC的方法,而且还通过其他采样方法。例如,在此目标应用程序中,我们提出了Markov链状态空间高达$ 10^{43} $的结果,而Ripple平均计算估算值不到$ 4 $小时。
This work considers the general task of estimating the sum of a bounded function over the edges of a graph, given neighborhood query access and where access to the entire network is prohibitively expensive. To estimate this sum, prior work proposes Markov chain Monte Carlo (MCMC) methods that use random walks started at some seed vertex and whose equilibrium distribution is the uniform distribution over all edges, eliminating the need to iterate over all edges. Unfortunately, these existing estimators are not scalable to massive real-world graphs. In this paper, we introduce Ripple, an MCMC-based estimator that achieves unprecedented scalability by stratifying the Markov chain state space into ordered strata with a new technique that we denote {\em sequential stratified regenerations}. We show that the Ripple estimator is consistent, highly parallelizable, and scales well. We empirically evaluate our method by applying Ripple to the task of estimating connected, induced subgraph counts given some input graph. Therein, we demonstrate that Ripple is accurate and can estimate counts of up to $12$-node subgraphs, which is a task at a scale that has been considered unreachable, not only by prior MCMC-based methods but also by other sampling approaches. For instance, in this target application, we present results in which the Markov chain state space is as large as $10^{43}$, for which Ripple computes estimates in less than $4$ hours, on average.