论文标题

时间空间的权衡寻找长期常见的子字符串

Time-Space Tradeoffs for Finding a Long Common Substring

论文作者

Ben-Nun, Stav, Golan, Shay, Kociumaka, Tomasz, Kraus, Matan

论文摘要

我们考虑了发现的问题,鉴于两个总长度$ n $的文档,这是两个文档的子字符串出现的最长字符串。这个问题被称为最长的常见子字符串(LCS)问题,具有经典的$ O(n)$ - 时间解决方案可以追溯到后缀树的发现(Weiner,1973)及其在整数字母的有效构造(Farach-Colton,1997)。但是,这些解决方案需要$θ(n)$空间,这在许多应用中都很高。为了解决这个问题,Starikovskaya和Vildhøj(CPM 2013)表明,对于$ n^{2/3} \ le s \ le n^{1-o(1)} $,LCS问题可以解决在$ O(S)$ Space和$ O(\ FRAC o(\ frac {n^2}}})$时。 Kociumaka等。 (ESA 2014)将这一权衡推广到$ 1 \ leq s \ leq n $,从而提供了从恒定到线性空间的平稳时空权衡。在本文中,我们可以为所寻求的LCS的长度$ l $较大的实例获得大幅加速。对于$ 1 \ leq S \ leq n $,我们表明可以在$ o(s)$ space和$ \ tilde {o}(\ frac {n^2} {l \ cdot s}+n)$ time中解决LCS问题。结果是基于源自与不匹配问题的LCS的技术(Flouri et al。,2015; Charalampopoulos等人,CPM,2018年),在当地的空间一致的解析上(Birenzwige等,SODA 2020),以及最大重复的结构(SODA)在Input文档中的结构。

We consider the problem of finding, given two documents of total length $n$, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic $O(n)$-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require $Θ(n)$ space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for $n^{2/3} \le s \le n^{1-o(1)}$, the LCS problem can be solved in $O(s)$ space and $O(\frac{n^2}{s})$ time. Kociumaka et al. (ESA 2014) generalized this tradeoff to $1 \leq s \leq n$, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length $L$ of the sought LCS is large. For $1 \leq s \leq n$, we show that the LCS problem can be solved in $O(s)$ space and $\tilde{O}(\frac{n^2}{L\cdot s}+n)$ time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源