论文标题
具有蛋白质序列模型的突变路径:从采样到平均场表征
Mutational paths with sequence-based models of proteins: from sampling to mean-field characterisation
论文作者
论文摘要
识别和表征突变路径是进化生物学和生物工程中的重要问题。在这里,我们介绍了突变路径的一般描述,从序列的好处和突变动力学(序列如何变化)沿路径介绍。我们首先提出了一种算法来采样突变路径,该算法是基准在硅中精确溶解的蛋白质模型基准,并应用于具有限制性Boltzmann机器的序列数据中从序列数据中学到的数据驱动的自然蛋白模型。然后,我们使用平均场理论来表征突变路径的特性,以使其具有不同感兴趣的突变动力学,并展示如何将其用于扩展基摩村的进化距离估计值,以对基于序列的基于序列的上位词的选择模型。
Identifying and characterizing mutational paths is an important issue in evolutionary biology and in bioengineering. We here introduce a generic description of mutational paths in terms of the goodness of sequences and of the mutational dynamics (how sequences change) along the path. We first propose an algorithm to sample mutational paths, which we benchmark on exactly solvable models of proteins in silico, and apply to data-driven models of natural proteins learned from sequence data with Restricted Boltzmann Machines. We then use mean-field theory to characterize the properties of mutational paths for different mutational dynamics of interest, and show how it can be used to extend Kimura's estimate of evolutionary distances to sequence-based epistatic models of selection.