论文标题
测试身份的顺序算法和分布的亲密关系
Sequential algorithms for testing identity and closeness of distributions
论文作者
论文摘要
\ emph {sequential}过程有什么优点提供了用于测试未知分布属性的批次算法?专注于测试两个分布的问题$ \ MATHCAL {D} _1 $和$ \ MATHCAL {D} _2 $上的$ \ {1,\ dots,n \} $是相等的还是$ε$ -FAR,我们给这个问题给出了几个答案。我们表明,对于一个小的字母尺寸$ n $,有一种顺序算法,其表现要优于任何批次算法的算法至少为$ 4 $的元素。对于一般的字母尺寸$ n $,我们提供了一种连续算法,该算法使用的样品不超过其批次对应,如果实际距离$ tv(\ Mathcal {d} _1,\ Mathcal {d} _2} _2 _1 $ \ nathcal {d} $ _1 $ _1 $和$ \ $ \ $ calcal和$ calcal。作为推论,让$ε$转到$ 0 $,当没有先验限制在$ tv上(\ Mathcal {d} _1,\ Mathcal {D} _2)$时,我们获得了一种用于测试接近度的顺序算法。 $ \ tilde {\ Mathcal {o}}(\ frac {n^{2/3}}} {TV(\ Mathcal {D} _1,\ Mathcal {d} _2 _2)_2)^{4/3}}}}}})$: $ \ tilde {\ Mathcal {o}}(\ frac {n/\ log n} {tv(\ Mathcal {d} _1,\ Mathcal {d} _2} _2)^{2)^{2}}} $ cite {dastalakis2017optal} to to to to to to to我们还为测试身份和亲密关系的问题建立了顺序算法的局限性:它们可以最多可以通过恒定因素来改善样本的最坏情况。
What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $ε$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least $4$ in terms sample complexity. For a general alphabet size $n$, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance $TV(\mathcal{D}_1, \mathcal{D}_2)$ between $\mathcal{D}_1$ and $\mathcal{D}_2$ is larger than $ε$. As a corollary, letting $ε$ go to $0$, we obtain a sequential algorithm for testing closeness when no a priori bound on $TV(\mathcal{D}_1, \mathcal{D}_2)$ is given that has a sample complexity $\tilde{\mathcal{O}}(\frac{n^{2/3}}{TV(\mathcal{D}_1, \mathcal{D}_2)^{4/3}})$: this improves over the $\tilde{\mathcal{O}}(\frac{n/\log n}{TV(\mathcal{D}_1, \mathcal{D}_2)^{2} })$ tester of \cite{daskalakis2017optimal} and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor.