论文标题

关于GSACA后缀阵列构建算法的优化

On the Optimisation of the GSACA Suffix Array Construction Algorithm

论文作者

Olbrich, Jannik, Ohlebusch, Enno, Büchler, Thomas

论文摘要

后缀阵列可以说是序列分析中最重要的数据结构之一,因此存在多种后缀排序算法。但是,到目前为止,2015年推出的GSACA算法是唯一已知的非恢复线性后缀阵列构建算法(SACA)。尽管具有有趣的理论属性,但在改善该算法的Subpar Real-World性能方面几乎没有努力。有一种超级线性算法DSH,它依赖于相同的排序原则,并且比Divsufsort更快,Divsufsort是十多年来最快的SACA。本文涉及分析GSACA和DSH中使用的分类原理并利用其属性以提供优化的线性时间算法。我们由此产生的算法不仅要比GSACA快得多,而且表现优于Divsufsort和DSH。

The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving the algorithm's subpar real-world performance. There is a super-linear algorithm DSH which relies on the same sorting principle and is faster than DivSufSort, the fastest SACA for over a decade. This paper is concerned with analysing the sorting principle used in GSACA and DSH and exploiting its properties in order to give an optimised linear-time algorithm. Our resulting algorithm is not only significantly faster than GSACA but also outperforms DivSufSort and DSH.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源