论文标题

部分可观测时空混沌系统的无模型预测

Efficient Representation of Large-Alphabet Probability Distributions

论文作者

Adler, Aviv, Tang, Jennifer, Polyanskiy, Yury

论文摘要

许多工程和科学问题需要代表和操纵大型字母的概率分布,我们可能会想到,皇家的长量向量总和到$ 1 $。在某些情况下,需要代表每个条目仅$ b $位的向量。一个自然的选择是将间隔$ [0,1] $划分为$ 2^b $统一垃圾箱,并独立量化每个垃圾箱的条目。我们表明,对此过程进行了较小的修改 - 在量化之前应用“进入”非线性函数(Compander)$ f(x)$ - 产生了一种极为有效的量化方法。例如,对于$ b = 8(16)$和$ 10^5 $尺寸的字母,表示的质量从损失(kl Divergence下)$ 0.5(0.1)$ BITS/输入到$ 10^{ - 4}(10^{ - 9})$ bits/bits/bits/bits/bits/bits/bits/bit/bit/bit/noter。与浮点表示相比,我们的compander方法从$ 10^{ - 1}(10^{ - 6})$提高到$ 10^{ - 4}(10^{ - 9})$ bits/entry。这些数字都适用于现实世界中的数据(书籍中的单词频率和DNA $ k $ -mer计数)和合成随机生成的分布。从理论上讲,我们设置了一个最小值的最佳标准,并表明compandand $ f(x)〜\ propto〜 \ mathrm {arcsinh}(\ sqrt {(1/2)(1/2)(k \ log k)x} $对于$ k $ - 字母和$ b \ to \ infty $。有趣的是,超立方体上二次损失的类似的最小值标准显示出标准均匀量化器的最佳性。这表明$ \ mathrm {arcsinh} $ Quantizer对于KL-Distortion与二次失真的统一量化器一样基础。

A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to $1$. In some cases it is required to represent such a vector with only $b$ bits per entry. A natural choice is to partition the interval $[0,1]$ into $2^b$ uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure -- applying an entrywise non-linear function (compander) $f(x)$ prior to quantization -- yields an extremely effective quantization method. For example, for $b=8 (16)$ and $10^5$-sized alphabets, the quality of representation improves from a loss (under KL divergence) of $0.5 (0.1)$ bits/entry to $10^{-4} (10^{-9})$ bits/entry. Compared to floating point representations, our compander method improves the loss from $10^{-1}(10^{-6})$ to $10^{-4}(10^{-9})$ bits/entry. These numbers hold for both real-world data (word frequencies in books and DNA $k$-mer counts) and for synthetic randomly generated distributions. Theoretically, we set up a minimax optimality criterion and show that the compander $f(x) ~\propto~ \mathrm{ArcSinh}(\sqrt{(1/2) (K \log K) x})$ achieves near-optimal performance, attaining a KL-quantization loss of $\asymp 2^{-2b} \log^2 K$ for a $K$-letter alphabet and $b\to \infty$. Interestingly, a similar minimax criterion for the quadratic loss on the hypercube shows optimality of the standard uniform quantizer. This suggests that the $\mathrm{ArcSinh}$ quantizer is as fundamental for KL-distortion as the uniform quantizer for quadratic distortion.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源