论文标题

测量重复性的L系统*

L-systems for Measuring Repetitiveness*

论文作者

Navarro, Gonzalo, Urbina, Cristian

论文摘要

L系统(用于无损压缩)是一个CPD0L系统扩展,其中有两个参数$ d $和$ n $,它明确地确定了一个字符串$ w =τ(φ^d(s))[1:n] $,其中$φ$是系统的形式,$ s $ s $是它的agiom,并且是其$τ$的编码。 L系统生成$ W $的最短描述的长度被称为$ \ ell $,可以说是重复性的相关度量,这是基于顺序出现的自相似性的。 在本文中,我们加深了对$ \ ell $及其与$δ$的关系的研究,这是基于基因复杂性的更好确定的下限。我们的结果表明,$ \ ell $和$δ$在很大程度上是正交的,从某种意义上说,一个可以比另一个大得多。这表明,重复性的两个来源主要是无关的。我们还表明,最近引入的NU系统,将L系统的功能与双向宏观机构相结合,可能比这两种机制都非常严格,这使得最小的NU系统的尺寸$ν$是最小的NU系统的尺寸$ν$。

An L-system (for lossless compression) is a CPD0L-system extended with two parameters $d$ and $n$, which determines unambiguously a string $w = τ(φ^d(s))[1:n]$, where $φ$ is the morphism of the system, $s$ is its axiom, and $τ$ is its coding. The length of the shortest description of an L-system generating $w$ is known as $\ell$, and is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence. In this paper we deepen the study of the measure $\ell$ and its relation with $δ$, a better established lower bound that builds on substring complexity. Our results show that $\ell$ and $δ$ are largely orthogonal, in the sense that one can be much larger than the other depending on the case. This suggests that both sources of repetitiveness are mostly unrelated. We also show that the recently introduced NU-systems, which combine the capabilities of L-systems with bidirectional macro-schemes, can be asymptotically strictly smaller than both mechanisms, which makes the size $ν$ of the smallest NU-system the unique smallest reachable repetitiveness measure to date.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源