NFL：通过分布转换的强大学习指数

论文标题

NFL：通过分布转换的强大学习指数

NFL: Robust Learned Index via Distribution Transformation

论文作者

Wu, Shangyu, Cui, Yufei, Yu, Jinghuan, Sun, Xuan, Kuo, Tei-Wei, Xue, Chun Jason

论文摘要

关于学习指数的最新作品为索引领域打开了新的方向。学习索引的关键见解是近似键和位置之间的映射。这样的方法需要分区关键空间才能更好地近似。尽管提出了许多启发式方法来提高近似质量，但瓶颈是分割开销可能会阻碍整体性能。本文通过在构造学习索引之前将\ textit {分布转换}应用于键来解决近似问题。提出了一个基于两阶段的归一化索引框架（NFL），该索引框架（NFL）首先将原始的复杂键分布转换为近乎均匀的分布，然后构建一个利用转换的键的学习索引。对于有效的分布转换，我们提出了数值归一化流（数值NF）。基于转换键的特征，我们提出了一个强大的售后学习指数（AFLI）。为了验证绩效，对合成和现实世界的工作量进行了全面的评估，这表明所提出的NFL与最先进的学习指数相比产生最高的吞吐量和最低的尾巴潜伏期。

Recent works on learned index open a new direction for the indexing field. The key insight of the learned index is to approximate the mapping between keys and positions with piece-wise linear functions. Such methods require partitioning key space for a better approximation. Although lots of heuristics are proposed to improve the approximation quality, the bottleneck is that the segmentation overheads could hinder the overall performance. This paper tackles the approximation problem by applying a \textit{distribution transformation} to the keys before constructing the learned index. A two-stage Normalizing-Flow-based Learned index framework (NFL) is proposed, which first transforms the original complex key distribution into a near-uniform distribution, then builds a learned index leveraging the transformed keys. For effective distribution transformation, we propose a Numerical Normalizing Flow (Numerical NF). Based on the characteristics of the transformed keys, we propose a robust After-Flow Learned Index (AFLI). To validate the performance, comprehensive evaluations are conducted on both synthetic and real-world workloads, which shows that the proposed NFL produces the highest throughput and the lowest tail latency compared to the state-of-the-art learned indexes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题