论文标题
构图分布的得分匹配
Score matching for compositional distributions
论文作者
论文摘要
由于对样本空间的非负性和一对一的约束,组成数据和具有已知总数的多元计数数据在分析方面具有挑战性。通常,许多组成部分都是高度右手的,并具有大量的零。当前可用的组成模型估计量的主要限制是,它们要么无法处理数据中的许多零,要么在中等至高维度上不可计算。我们得出了一组新的新分数匹配估计量,适用于带有边界的Riemannian歧管上的分布,其中标准单纯形是一种特殊情况。应用得分匹配方法用于估计组合数据的新灵活截断模型中的参数,我们证明估计器可扩展并以封闭形式可用。通过大量的模拟研究,证明评分方法可以很好地估算新的截断模型中的参数以及dirichlet分布中的参数。我们将新的模型和估计器应用于真实的微生物组组成数据,并表明该模型非常适合数据。
Compositional data and multivariate count data with known totals are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. It is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. A major limitation of currently available estimators for compositional models is that they either cannot handle many zeros in the data or are not computationally feasible in moderate to high dimensions. We derive a new set of novel score matching estimators applicable to distributions on a Riemannian manifold with boundary, of which the standard simplex is a special case. The score matching method is applied to estimate the parameters in a new flexible truncation model for compositional data and we show that the estimators are scalable and available in closed form. Through extensive simulation studies, the scoring methodology is demonstrated to work well for estimating the parameters in the new truncation model and also for the Dirichlet distribution. We apply the new model and estimators to real microbiome compositional data and show that the model provides a good fit to the data.