随着移民的出生过程中顺序抽样的另一个观点

论文标题

随着移民的出生过程中顺序抽样的另一个观点

Another view of sequential sampling in the birth process with immigration

论文作者

da Silva, Poly H., Jamshidpey, Arash, Tavaré, Simon

论文摘要

计数数据的模型已在生物科学中广泛使用，例如癌症，种群遗传学，抽样理论和生态学。在本文中，我们探讨了一个模型的属性，该模型嵌入了连续的时间过程中，并可以描述某些生物学数据的外观，例如数据库中的covid DNA序列。更具体地说，我们考虑了一个不断发展的计数数据模型，该模型是随着样本的家族大小数量的顺序从移民（BI）的出生过程中获取的。在这里，每个家族代表一种类型或物种，家庭规模计数代表人口中的类型或物种频谱。我们研究$ s（a，b）$和$ s（c，d）$的相关性，在两个不连接时间间隔$（a，b）$和$（c，d）$中观察到的家庭数量。我们发现预期的样本方差及其对$ p $连续的顺序样本的渐近方差$ \ mathbf {s} _p：=（s（t_0，t_1），\ dots，s（t_ {p-1}，t_p），t_ {p-1}，t_p）$，对于任何给定的$ 0 = t_0 <t_0 <t_1 <t_1 <t_1 <t_1 <\ dots <t_p $。通过在样品的尺寸上进行调节，我们提供了$ \ mathbf {s} _p $和$ p $ suectential样本$ n_1，n_2，\ dots，n_p $之间的连接。在Da Silva等人中研究了后者的特性。（2022）。我们展示了连续时间框架如何有助于使渐近计算比离散时间更容易。作为一个应用程序，对于$ t_1，t_2，\ dots，t_p $的特定选择，我们重新访问了费舍尔（Fisher）1943年的多样采样问题，并对费舍尔（Fisher）模型在BI流程中绘制的顺序样本中的含义进行了另一种解释。

Models of counts-of-counts data have been extensively used in the biological sciences, for example in cancer, population genetics, sampling theory and ecology. In this paper we explore properties of one model that is embedded into a continuous-time process and can describe the appearance of certain biological data such as covid DNA sequences in a database. More specifically, we consider an evolving model of counts-of-counts data that arises as the family size counts of samples taken sequentially from a Birth process with Immigration (BI). Here, each family represents a type or species, and the family size counts represent the type or species frequency spectrum in the population. We study the correlation of $S(a,b)$ and $S(c,d)$, the number of families observed in two disjoint time intervals $(a,b)$ and $(c,d)$. We find the expected sample variance and its asymptotics for $p$ consecutive sequential samples $\mathbf{S}_p:=(S(t_0,t_1),\dots, S(t_{p-1},t_p))$, for any given $0=t_0<t_1<\dots<t_p$. By conditioning on the sizes of the samples, we provide a connection between $\mathbf{S}_p$ and $p$ sequential samples of sizes $n_1,n_2,\dots,n_p$, drawn from a single run of a Chinese Restaurant Process. The properties of the latter were studied in da Silva et al. (2022). We show how the continuous-time framework helps to make asymptotic calculations easier than its discrete-time counterpart. As an application, for a specific choice of $t_1,t_2,\dots, t_p$, we revisit Fisher's 1943 multi-sampling problem and give another explanation of what Fisher's model could have meant in the world of sequential samples drawn from a BI process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题