对基于DNA的数据存储的误差校正的容量接收约束代码

论文标题

对基于DNA的数据存储的误差校正的容量接收约束代码

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

论文作者

Nguyen, Tuan Thanh, Cai, Kui, Immink, Kees A. Schouhamer, Kiah, Han Mao

论文摘要

我们提出的编码技术限制了均聚物运行的长度，确保GC-Content约束，并能够纠正基于DNA的数据存储系统中核苷酸链中的单个编辑误差。特别是，对于给定的$ \ ell，ε> 0 $，我们提出了简单有效的编码器/解码器，将二进制序列转换为DNA碱基序列（代码字），即符号a，t，c和g的序列序列，以满足以下属性：（i）runlength Condrinaint：（i）runlenge Condrinaint：最大的共态分配：II $ ELR均为II $ ELR（II $ s）（II $ s）（c。每个代码字的GC包含在$ [0.5-ε，0.5+ε] $，（iii）误差校正之内：每个代码字能够纠正单个删除，单个插入或单个替换误差。对于$ \ ell $和$ε$的实用值，我们表明我们的编码器的成绩比文献中现有结果的速度高得多，并接近了能力。我们的方法具有低编码/解码的复杂性和有限的误差传播。

We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given $\ell, ε > 0$, we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most $\ell$, (ii) GC-content constraint: the GC-content of each codeword is within $[0.5-ε, 0.5+ε]$, (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of $\ell$ and $ε$, we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题