论文标题

Helixfold:使用PaddlePaddle的AlphaFold2有效实现

HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle

论文作者

Wang, Guoxia, Fang, Xiaomin, Wu, Zhihua, Liu, Yiqun, Xue, Yang, Xiang, Yingfei, Yu, Dianhai, Wang, Fan, Ma, Yanjun

论文摘要

准确的蛋白质结构预测可以显着加速生命科学的发展。 Alphafold2的准确性是边境端到端结构预测系统,已经接近实验确定技术的准确性。由于复杂的模型体系结构和大量的内存消耗,因此需要大量的计算资源和时间来实施从头开始实施Alphafold2的训练和推断。对于大多数个人和机构来说,运行原始AlphaFold2的成本都是昂贵的。因此,降低这一成本可以加速生命科学的发展。我们使用PaddlePaddle(即HelixFold)实现Alphafold2,以提高训练和推理速度并减少记忆消耗。操作员融合,张量融合和混合并行性计算可以提高性能,而通过重新计算,BFLOAT16和内存读/写入/写入/写入/写入/写入/编写。与原始的Alphafold2(用JAX实施)和OpenFold(由Pytorch实施)相比,HelixFold只需7.5天即可完成完整的端到端培训,并且在使用混合并行性时只需要5.3天,而Alphafold2和OpenFold都需要大约11天。 Helixfold节省了1倍的训练时间。我们验证了HelixFold的准确性可能与CASP14和CAMAO数据集上的Alphafold2相当。 HelixFold的代码可在GitHub上免费下载:https://github.com/paddlepaddle/paddlehelix/tree/dev/dev/pprotein_folding/helixfold,我们还可以在https://paddlehelix.baidu.com/app/app/app/prugecececin/forecin/propeiin/propein/propecote.com for。

Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented with Jax) and OpenFold (implemented with PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold's code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源