论文标题
随机血统的有效重建:从理论到实践的一些步骤
Efficient Reconstruction of Stochastic Pedigrees: Some Steps From Theory to Practice
论文作者
论文摘要
在现存的人群中,现有个人在其祖先的血统中提供多少信息? Kim,Mossel,Ramnarayan和Turner(2020)的最新工作在许多简化的假设中研究了这个问题,包括随机交配,固定长度继承块和足够大的基础人群。他们表明,在这些条件下,如果平均后代数量足够大的常数,则可以通过他们称为rec-gen的算法来恢复大部分的谱系结构和遗传含量。 我们有兴趣研究根据模型生成的模拟数据的rec-gen的性能。作为第一步,我们改善了算法的运行时间。但是,我们观察到,即使算法的更快版本在任何模拟中都无法恢复超过2代的谱系。我们声称,这是由于在任何可以运行算法的情况下,即使在模拟数据上都可以运行算法的近交。为了支持这一说法,我们表明该算法的主要步骤称为祖先重建,在理想化的环境中表现准确,没有近交,但在随机交配种群中的性能很差。 为了克服rec-gen的不良行为,我们引入了一种基于信念的启发主义,该启发式是导致近亲繁殖的,并且在我们的模拟中表现更好。
In an extant population, how much information do extant individuals provide on the pedigree of their ancestors? Recent work by Kim, Mossel, Ramnarayan and Turner (2020) studied this question under a number of simplifying assumptions, including random mating, fixed length inheritance blocks and sufficiently large founding population. They showed that under these conditions if the average number of offspring is a sufficiently large constant, then it is possible to recover a large fraction of the pedigree structure and genetic content by an algorithm they named REC-GEN. We are interested in studying the performance of REC-GEN on simulated data generated according to the model. As a first step, we improve the running time of the algorithm. However, we observe that even the faster version of the algorithm does not do well in any simulations in recovering the pedigree beyond 2 generations. We claim that this is due to the inbreeding present in any setting where the algorithm can be run, even on simulated data. To support the claim we show that a main step of the algorithm, called ancestral reconstruction, performs accurately in a idealized setting with no inbreeding but performs poorly in random mating populations. To overcome the poor behavior of REC-GEN we introduce a Belief-Propagation based heuristic that accounts for the inbreeding and performs much better in our simulations.