论文标题

贝叶斯非参数推断“物种采样”问题

Bayesian Nonparametric Inference for "Species-sampling" Problems

论文作者

Balocchi, Cecilia, Favaro, Stefano, Naulet, Zacharie

论文摘要

鉴于从属于物种的个体人群中观察到的样本,“物种采样”问题(SSP)要求估计来自同一人群的其他不可观察样品的未知物种组成的某些特征。在SSP中,在过去的三十年中,出现了估计覆盖范围概率的问题,看不见的物种和流行率的覆盖范围,因为它是众多方法论和应用作品的主题,主要是在生物学上,而且在统计机器学习,电气工程,理论计算机科学,信息科学,信息理论和法医统计中。在本文中,我们专注于这些受欢迎的SSP,并概述了他们在Pitman-yor Process(PYP)之前的贝叶斯非参数分析(BNP)分析。在审查文献时,我们通过建立简单的复合二项式和高几何分布来建立新的后验表示,改善了现有后验推断的计算和解释性,通常是通过复杂的组合数来表达的。我们还考虑了估计PYP先验的折扣和比例参数的问题,显示了通过层次贝叶斯和经验贝叶斯方法对估算的贝叶斯一致性的属性,也就是说:可以始终如一地估算折扣参数,而量表参数不能始终如一地估算,从而在后端次要方面进行建议。我们通过讨论SSP的一些概括(主要是在生物科学领域)来结束我们的工作,这些科学领域涉及“特征采样”,多个人群共享物种和马尔可夫链的类别。

Given an observed sample from a population of individuals belonging to species, "species-sampling" problems (SSPs) call for estimating some features of the unknown species composition of additional unobservable samples from the same population. Within SSPs, the problems of estimating coverage probabilities, the number of unseen species and coverages of prevalences have emerged in the past three decades for being the subject of numerous methodological and applied works, mostly in biological sciences but also in statistical machine learning, electrical engineering, theoretical computer science, information theory and forensic statistics. In this paper, we focus on these popular SSPs, and present an overview of their Bayesian nonparametric (BNP) analysis under the Pitman--Yor process (PYP) prior. While reviewing the literature, we improve on computation and interpretability of existing posterior inferences, typically expressed through complicated combinatorial numbers, by establishing novel posterior representations in terms of simple compound Binomial and Hypergeometric distributions. We also consider the problem of estimating the discount and scale parameters of the PYP prior, showing a property of Bayesian consistency with respect to estimation through the hierarchical Bayes and empirical Bayes approaches, that is: the discount parameter can be estimated consistently, whereas the scale parameter cannot be estimated consistently, thus advising caution in posterior inference. We conclude our work by discussing some generalizations of SSPs, mostly in the field of biological sciences, which deal with "feature-sampling", multiple populations of individuals sharing species and classes of Markov chains.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源