论文标题
部分可观测时空混沌系统的无模型预测
Network Analysis of Count Data from Mixed Populations
论文作者
论文摘要
在基于单细胞RNA测序数据的基因调节网络分析等应用中,样本通常来自不同人群的混合物,每个人群都有其独特的网络。可用的图形模型通常假定所有样本都来自同一人群并共享同一网络。必须首先将样本聚集,并使用可用的方法分别推断每个群集的网络。但是,此两步过程忽略了聚类步骤中的不确定性,因此可能导致网络估计不准确。在这些应用程序的激励下,我们考虑了混合泊松泊托原木正常模型,用于从混合种群中的计数数据网络推断。混合模型的潜在精度矩阵对应于不同种群的网络,可以通过最大化套索含量的对数可能性来共同估算。在相当温和的条件下,我们表明混合泊松对数正态模型是可识别的,并且具有正定的Fisher信息矩阵。还建立了最大拉索含量的对数似然估计器的一致性。为了避免对数类样的棘手优化,我们根据变异推理方法开发了一种称为VMPLN的算法。全面的仿真和实际的单细胞RNA测序数据分析表明,VMPLN的表现出色。
In applications such as gene regulatory network analysis based on single-cell RNA sequencing data, samples often come from a mixture of different populations and each population has its own unique network. Available graphical models often assume that all samples are from the same population and share the same network. One has to first cluster the samples and use available methods to infer the network for every cluster separately. However, this two-step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate network estimation. Motivated by these applications, we consider the mixture Poisson log-normal model for network inference of count data from mixed populations. The latent precision matrices of the mixture model correspond to the networks of different populations and can be jointly estimated by maximizing the lasso-penalized log-likelihood. Under rather mild conditions, we show that the mixture Poisson log-normal model is identifiable and has the positive definite Fisher information matrix. Consistency of the maximum lasso-penalized log-likelihood estimator is also established. To avoid the intractable optimization of the log-likelihood, we develop an algorithm called VMPLN based on the variational inference method. Comprehensive simulation and real single-cell RNA sequencing data analyses demonstrate the superior performance of VMPLN.