论文标题

网络数据的无监督实例和子网选择

Unsupervised Instance and Subnetwork Selection for Network Data

论文作者

Zhang, Lin, Moskwa, Nicholas, Larsen, Melinda, Bogdanov, Petko

论文摘要

与表格数据不同,网络数据中的功能在特定于域的图中互连。此设置的示例包括覆盖在蛋白质相互作用网络(PPI)和社交网络中的用户意见上的基因表达。网络数据通常是高维(大量节点),并且通常包含离群的快照实例和噪声。此外,使用全球标签(例如疾病或正常)注释实例通常是非平凡且耗时的。我们如何在没有监督的情况下共同选择网络数据的歧视性子网和代表性实例?我们通过凸自我代表目标的无监督框架中的无监督框架和称为UISS的实例选择的实例选择。给定一个未标记的网络数据集,UISS在忽略异常值的同时确定了代表性实例。它在歧视性子网络选择和代表性实例选择方面的表现优于最先进的基准,在我们使用的所有真实世界数据集上,可以提高高达10%的准确性。当在来自多项研究的RNA-Seq网络样本中用于探索性分析时,它会产生可解释的和信息丰富的摘要。

Unlike tabular data, features in network data are interconnected within a domain-specific graph. Examples of this setting include gene expression overlaid on a protein interaction network (PPI) and user opinions in a social network. Network data is typically high-dimensional (large number of nodes) and often contains outlier snapshot instances and noise. In addition, it is often non-trivial and time-consuming to annotate instances with global labels (e.g., disease or normal). How can we jointly select discriminative subnetworks and representative instances for network data without supervision? We address these challenges within an unsupervised framework for joint subnetwork and instance selection in network data, called UISS, via a convex self-representation objective. Given an unlabeled network dataset, UISS identifies representative instances while ignoring outliers. It outperforms state-of-the-art baselines on both discriminative subnetwork selection and representative instance selection, achieving up to 10% accuracy improvement on all real-world data sets we use for evaluation. When employed for exploratory analysis in RNA-seq network samples from multiple studies it produces interpretable and informative summaries.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源