论文标题

青金石是用于大规模开放病毒测序数据库的快速网络API

LAPIS is a fast web API for massive open virus sequencing databases

论文作者

Chen, Chaoran, Taepper, Alexander, Engelniederhammer, Fabian, Kellerer, Jonas, Roemer, Cornelius, Stadler, Tanja

论文摘要

背景:最近的流行病暴发,例如SARS-COV-2大流行和2022年的MPOX爆发,已经证明了基因组测序数据的价值,用于跟踪病原体的起源和扩散。全球实验室以前所未有的速度和数量生成了新序列,生物信息学家开发了新的工具和仪表板来分析这些数据丰富。但是,仍然存在的主要挑战是缺乏访问和处理测序数据的简单有效方法。 结果:序列(LAPIS)的轻质API有助于通过REST API快速检索和分析基因组测序数据。它支持基于复杂的突变和基于元数据的查询,并可以在大规模数据集上执行聚合操作。青金石是针对与基因组流行病学相关的典型问题进行了优化的。使用新开发的内存数据库引擎,它具有高速和吞吐量:在2023年1月25日至2月4日之间,Lapis的SARS-COV-2实例包含1450万个序列,以超过2000万个请求处理,平均响应时间为411 ms,中位响应时间为1 ms。青金石是我们在Genspectrum.org上仪表板背后的核心引擎,我们目前维护SARS-COV-2和MPOX的公共青金石实例。 结论:由优化的数据库引擎供电并通过Web API提供,LAPIS可以增强基因组测序数据的可访问性。它旨在作为仪表板的共同后端,并进行了分析,并有可能将其集成到GenBank等通用数据库平台中。

Background: Recent epidemic outbreaks such as the SARS-CoV-2 pandemic and the mpox outbreak in 2022 have demonstrated the value of genomic sequencing data for tracking the origin and spread of pathogens. Laboratories around the globe generated new sequences at unprecedented speed and volume and bioinformaticians developed new tools and dashboards to analyze this wealth of data. However, a major challenge that remains is the lack of simple and efficient approaches for accessing and processing sequencing data. Results: The Lightweight API for Sequences (LAPIS) facilitates rapid retrieval and analysis of genomic sequencing data through a REST API. It supports complex mutation- and metadata-based queries and can perform aggregation operations on massive datasets. LAPIS is optimized for typical questions relevant to genomic epidemiology. Using a newly-developed in-memory database engine, it has a high speed and throughput: between 25 January and 4 February 2023, the SARS-CoV-2 instance of LAPIS, which contains 14.5 million sequences, processed over 20 million requests with a mean response time of 411 ms and a median response time of 1 ms. LAPIS is the core engine behind our dashboards on genspectrum.org and we currently maintain public LAPIS instances for SARS-CoV-2 and mpox. Conclusions: Powered by an optimized database engine and available through a web API, LAPIS enhances the accessibility of genomic sequencing data. It is designed to serve as a common backend for dashboards and analyses with the potential to be integrated into common database platforms such as GenBank.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源