时间空间的权衡寻找长期常见的子字符串

论文标题

时间空间的权衡寻找长期常见的子字符串

Time-Space Tradeoffs for Finding a Long Common Substring

论文作者

Ben-Nun, Stav, Golan, Shay, Kociumaka, Tomasz, Kraus, Matan

论文摘要

我们考虑了发现的问题，鉴于两个总长度$ n $的文档，这是两个文档的子字符串出现的最长字符串。这个问题被称为最长的常见子字符串（LCS）问题，具有经典的$ O（n）$ - 时间解决方案可以追溯到后缀树的发现（Weiner，1973）及其在整数字母的有效构造（Farach-Colton，1997）。但是，这些解决方案需要$θ（n）$空间，这在许多应用中都很高。为了解决这个问题，Starikovskaya和Vildhøj（CPM 2013）表明，对于$ n^{2/3} \ le s \ le n^{1-o（1）} $，LCS问题可以解决在$ O（S）$ Space和$ O（\ FRAC o（\ frac {n^2}}}）$时。 Kociumaka等。（ESA 2014）将这一权衡推广到$ 1 \ leq s \ leq n $，从而提供了从恒定到线性空间的平稳时空权衡。在本文中，我们可以为所寻求的LCS的长度$ l $较大的实例获得大幅加速。对于$ 1 \ leq S \ leq n $，我们表明可以在$ o（s）$ space和$ \ tilde {o}（\ frac {n^2} {l \ cdot s}+n）$ time中解决LCS问题。结果是基于源自与不匹配问题的LCS的技术（Flouri et al。，2015; Charalampopoulos等人，CPM，2018年），在当地的空间一致的解析上（Birenzwige等，SODA 2020），以及最大重复的结构（SODA）在Input文档中的结构。

We consider the problem of finding, given two documents of total length $n$, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic $O(n)$-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require $Θ(n)$ space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for $n^{2/3} \le s \le n^{1-o(1)}$, the LCS problem can be solved in $O(s)$ space and $O(\frac{n^2}{s})$ time. Kociumaka et al. (ESA 2014) generalized this tradeoff to $1 \leq s \leq n$, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length $L$ of the sought LCS is large. For $1 \leq s \leq n$, we show that the LCS problem can be solved in $O(s)$ space and $\tilde{O}(\frac{n^2}{L\cdot s}+n)$ time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题