论文标题

使用间隔种子提取长k-mers

Extraction of long k-mers using spaced seeds

论文作者

Leinonen, Miika, Salmela, Leena

论文摘要

在许多生物信息学应用程序中,从测序读取中提取K-MER是一项重要任务,例如基于DE Bruijn图的所有DNA序列分析方法。当使用的K-MER在分析的DNA中是唯一的,因此首选使用较长的K-mers时,这些方法往往更准确。当简短读取测序技术的读取长度增加时,错误率将成为k最大可能值的决定因素。在这里,我们提出了LOMEX,即使在存在测序误差的情况下,使用间隔种子也可以准确提取长K-MER。我们的实验表明,与标准的K-MER计数工具相比,LOMEX可以从当前Illumina读取的长k-mers读取更高的召回率。此外,我们对模拟数据的实验表明,当读取长度进一步增加时,标准K-MER计数器的性能下降,而Lomex仍然成功提取了长的K-Mers。

The extraction of k-mers from sequencing reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used k-mers are unique in the analyzed DNA, and thus the use of longer k-mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of k. Here we propose LoMeX which uses spaced seeds to extract long k-mers accurately even in the presence of sequencing errors. Our experiments show that LoMeX can extract long k-mers from current Illumina reads with a higher recall than a standard k-mer counting tool. Furthermore, our experiments on simulated data show that when the read length further increases, the performance of standard k-mer counters declines, whereas LoMeX still extracts long k-mers successfully.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源