论文标题
通过神经网络的图形着色,用于单倍型组装和病毒式列表重建
Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction
论文作者
论文摘要
理解遗传变异,例如通过突变,在生物体中,对揭示其对环境和人类健康的影响至关重要。可以通过解决单倍型组装问题来获得基本表征,从而产生多个染色体副本的变化。通过类似的方法,也解密了导致不同菌株的快速发展病毒的变化(称为准特性)。在这两种情况下,使用大型嘈杂片段(读取)的高通量测序技术用于推断成分组件(单倍型或准蛋白酶)。对于有两个以上染色体的多倍体物种而言,问题更难。解决此NP硬性问题的最新神经方法不能充分地模拟对于反应输入信号很重要的读取之间的关系。我们通过开发一种称为Neurhap的新方法来解决此问题,该方法将图表学习与组合优化相结合。我们的实验表明,与竞争方法相比,实际和合成数据集中神经哈普的性能要好。
Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also deciphered with similar approaches. In both these cases, high-throughput sequencing technologies that provide oversampled mixtures of large noisy fragments (reads) of genomes, are used to infer constituent components (haplotypes or quasispecies). The problem is harder for polyploid species where there are more than two copies of chromosomes. State-of-the-art neural approaches to solve this NP-hard problem do not adequately model relations among the reads that are important for deconvolving the input signal. We address this problem by developing a new method, called NeurHap, that combines graph representation learning with combinatorial optimization. Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches.