论文标题
基于K-NN模式估计的查询复杂性
Query Complexity of k-NN based Mode Estimation
论文作者
论文摘要
由未知多元概率密度函数的模式估计问题的动机,我们研究了n个点的给定数据集的最小k-最近邻居距离识别该点的问题。我们研究了成对距离是未知的情况,但是我们可以访问甲骨文,我们可以查询以获取有关任何一对点之间距离的嘈杂信息。对于两个天然的Oracle模型,我们根据置信区间的概念设计了一个顺序学习算法,该算法可以自适应地决定要发送到Oracle的查询,并能够以很高的可能性正确解决问题。我们在我们提出的方案的查询复杂性上得出了实例依赖性上限,并且通过广泛的数值评估对其他基准的性能显示出显着改善。
Motivated by the mode estimation problem of an unknown multivariate probability density function, we study the problem of identifying the point with the minimum k-th nearest neighbor distance for a given dataset of n points. We study the case where the pairwise distances are apriori unknown, but we have access to an oracle which we can query to get noisy information about the distance between any pair of points. For two natural oracle models, we design a sequential learning algorithm, based on the idea of confidence intervals, which adaptively decides which queries to send to the oracle and is able to correctly solve the problem with high probability. We derive instance-dependent upper bounds on the query complexity of our proposed scheme and also demonstrate significant improvement over the performance of other baselines via extensive numerical evaluations.