论文标题

系统发育树的热带密度估计

Tropical Density Estimation of Phylogenetic Trees

论文作者

Yoshida, Ruriko, Barnhill, David, Miura, Keiji, Howe, Daniel

论文摘要

来自生物学理论和经验数据的许多证据表明,基因树,从不同基因(LOCI)重建的系统发育树,不必具有完全相同的树拓扑。基因树之间的这种不一致可能是由于某些``异常''进化事件引起的,例如真核生物中的减数分裂性重组或原核生物中遗传物质的水平转移。但是,大多数基因树都受到其物种树的树拓扑的约束,即遵循其进化史的给定物种的系统发育树。为了发现不遵循``主要分布''树木的``外围''基因树,我们建议将``热带指标''与最大代数从热带几何形成到基因树的非参数估计,以``热带指标''的范围应用``热带指标''。在这项研究中,我们将``热带指标''应用于最大代数下的系统发育树的空间上的定义明确的度量,以对基因树在树上空间上的分布进行非参数估计。核密度估计器(KDE)是从给定样品中分布的最流行的非参数估计之一,我们在热带几何形状的环境中提出了经典KDE的类似物,并用热带指标来测量树木空间之间的固有地球固有的固有地球。我们通过附近树木的经验频率估算观察到的树的概率,其影响水平由热带度量标准确定。然后,通过从多物种合并模型产生的模拟数据,我们表明,使用热带指标的基因树分布的非参数估计比使用Billera-Holmes-Vogtmann(BHV)公制的估算性能更好。就计算时间和准确性而言。然后,我们将其应用于apicomplexa数据。

Much evidence from biological theory and empirical data indicates that, gene tree, phylogenetic trees reconstructed from different genes (loci), do not have to have exactly the same tree topologies. Such incongruence between gene trees might be caused by some ``unusual'' evolutionary events, such as meiotic sexual recombination in eukaryotes or horizontal transfers of genetic material in prokaryotes. However, most of gene trees are constrained by the tree topology of its species tree, that is, the phylogenetic tree of a given species following their evolutionary history. In order to discover ``outlying'' gene trees which do not follow the ``main distribution(s)'' of trees, we propose to apply the ``tropical metric'' with the max-plus algebra from tropical geometry to a non-parametric estimation of gene trees over the space of phylogenetic trees. In this research we apply the ``tropical metric,'' a well-defined metric over the space of phylogenetic trees under the max-plus algebra, to non-parametric estimation of gene trees distribution over the tree space. Kernel density estimator (KDE) is one of the most popular non-parametric estimation of a distribution from a given sample, and we propose an analogue of the classical KDE in the setting of tropical geometry with the tropical metric which measures the length of an intrinsic geodesic between trees over the tree space. We estimate the probability of an observed tree by empirical frequencies of nearby trees, with the level of influence determined by the tropical metric. Then, with simulated data generated from the multispecies coalescent model, we show that the non-parametric estimation of gene tree distribution using the tropical metric performs better than one using the Billera-Holmes-Vogtmann (BHV) metric developed by Weyenberg et al. in terms of computational times and accuracy. We then apply it to Apicomplexa data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源