论文标题
对基于DNA的数据存储的误差校正的容量接收约束代码
Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage
论文作者
论文摘要
我们提出的编码技术限制了均聚物运行的长度,确保GC-Content约束,并能够纠正基于DNA的数据存储系统中核苷酸链中的单个编辑误差。特别是,对于给定的$ \ ell,ε> 0 $,我们提出了简单有效的编码器/解码器,将二进制序列转换为DNA碱基序列(代码字),即符号a,t,c和g的序列序列,以满足以下属性:(i)runlength Condrinaint:(i)runlenge Condrinaint:最大的共态分配:II $ ELR均为II $ ELR(II $ s)(II $ s)(c。每个代码字的GC包含在$ [0.5-ε,0.5+ε] $,(iii)误差校正之内:每个代码字能够纠正单个删除,单个插入或单个替换误差。对于$ \ ell $和$ε$的实用值,我们表明我们的编码器的成绩比文献中现有结果的速度高得多,并接近了能力。我们的方法具有低编码/解码的复杂性和有限的误差传播。
We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given $\ell, ε > 0$, we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most $\ell$, (ii) GC-content constraint: the GC-content of each codeword is within $[0.5-ε, 0.5+ε]$, (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of $\ell$ and $ε$, we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.