论文标题
PALFM索引:palindrome模式匹配的FM索引
PalFM-index: FM-index for Palindrome Pattern Matching
论文作者
论文摘要
回文图案匹配(PAL匹配)是一种广义的模式匹配,其中两个字符串$ x $和$ y $相同长度的长度匹配(pal-Match),如果它们具有相同的palindromic结构,即任何可能的$ 1 \ le i <j \ j \ le | x | x | x | x | = | y | $,$ x [i..j] $是一个回文时,并且仅当$ y [i..j] $是palindrome时。匹配问题是在文本中搜索与模式匹配的子字符串的出现的问题。 Given a text $T$ of length $n$ over an alphabet of size $σ$, an index for pal-matching is to support, given a pattern $P$ of length $m$, the counting queries that compute the number $\mathsf{occ}$ of occurrences of $P$ and the locating queries that compute the occurrences of $P$. 〜[i等人,理论中的作者。计算。 Sci。,2013年]提出了一个$ O(n \ lg n)$ - 位数据结构,以支持$ O(m \lgσ)$时间的计数查询和$ o(m \lgσ+ \ mathsf {occ})中的定位查询。在本文中,我们为PAL匹配问题提出了一个FM索引类型索引,我们称之为PALFM索引,该索引占用了$ 2N \ lg \ min(σ,\ lg n) + 2n + 2n + o(n)$(n)$位,并支持$ O(o(m)$时间的计数查询。 palfm-indexes可以通过添加$ \ frac {n}δ\ lg n + n + n + n + n + n + o(n)$位的空间来支持$ O(m +δ\ Mathsf {occ})$时间中的定位查询,其中$δ$是从$ \ \ \ \ {1,2,n vers pred pred prep的$δ$,其中$δ$是一个参数。
The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings $x$ and $y$ of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible $1 \le i < j \le |x| = |y|$, $x[i..j]$ is a palindrome if and only if $y[i..j]$ is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text $T$ of length $n$ over an alphabet of size $σ$, an index for pal-matching is to support, given a pattern $P$ of length $m$, the counting queries that compute the number $\mathsf{occ}$ of occurrences of $P$ and the locating queries that compute the occurrences of $P$. The authors in~[I et al., Theor. Comput. Sci., 2013] proposed an $O(n \lg n)$-bit data structure to support the counting queries in $O(m \lg σ)$ time and the locating queries in $O(m \lg σ+ \mathsf{occ})$ time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies $2n \lg \min(σ, \lg n) + 2n + o(n)$ bits of space and supports the counting queries in $O(m)$ time. The PalFM-indexes can support the locating queries in $O(m + Δ\mathsf{occ})$ time by adding $\frac{n}Δ \lg n + n + o(n)$ bits of space, where $Δ$ is a parameter chosen from $\{1, 2, \dots, n\}$ in the preprocessing phase.