测试身份的顺序算法和分布的亲密关系

论文标题

测试身份的顺序算法和分布的亲密关系

Sequential algorithms for testing identity and closeness of distributions

论文作者

Fawzi, Omar, Flammarion, Nicolas, Garivier, Aurélien, Oufkir, Aadil

论文摘要

\ emph {sequential}过程有什么优点提供了用于测试未知分布属性的批次算法？专注于测试两个分布的问题$ \ MATHCAL {D} _1 $和$ \ MATHCAL {D} _2 $上的$ \ {1，\ dots，n \} $是相等的还是$ε$ -FAR，我们给这个问题给出了几个答案。我们表明，对于一个小的字母尺寸$ n $，有一种顺序算法，其表现要优于任何批次算法的算法至少为$ 4 $的元素。对于一般的字母尺寸$ n $，我们提供了一种连续算法，该算法使用的样品不超过其批次对应，如果实际距离$ tv（\ Mathcal {d} _1，\ Mathcal {d} _2} _2 _1 $ \ nathcal {d} $ _1 $ _1 $和$ \ $ \ $ calcal和$ calcal。作为推论，让$ε$转到$ 0 $，当没有先验限制在$ tv上（\ Mathcal {d} _1，\ Mathcal {D} _2）$时，我们获得了一种用于测试接近度的顺序算法。 $ \ tilde {\ Mathcal {o}}（\ frac {n^{2/3}}} {TV（\ Mathcal {D} _1，\ Mathcal {d} _2 _2）_2）^{4/3}}}}}}）$： $ \ tilde {\ Mathcal {o}}（\ frac {n/\ log n} {tv（\ Mathcal {d} _1，\ Mathcal {d} _2} _2）^{2）^{2}}} $ cite {dastalakis2017optal} to to to to to to to我们还为测试身份和亲密关系的问题建立了顺序算法的局限性：它们可以最多可以通过恒定因素来改善样本的最坏情况。

What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $ε$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least $4$ in terms sample complexity. For a general alphabet size $n$, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance $TV(\mathcal{D}_1, \mathcal{D}_2)$ between $\mathcal{D}_1$ and $\mathcal{D}_2$ is larger than $ε$. As a corollary, letting $ε$ go to $0$, we obtain a sequential algorithm for testing closeness when no a priori bound on $TV(\mathcal{D}_1, \mathcal{D}_2)$ is given that has a sample complexity $\tilde{\mathcal{O}}(\frac{n^{2/3}}{TV(\mathcal{D}_1, \mathcal{D}_2)^{4/3}})$: this improves over the $\tilde{\mathcal{O}}(\frac{n/\log n}{TV(\mathcal{D}_1, \mathcal{D}_2)^{2} })$ tester of \cite{daskalakis2017optimal} and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题