通过非线性k-distance近似的记忆效率RKNN检索

论文标题

通过非线性k-distance近似的记忆效率RKNN检索

Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation

论文作者

Obermeier, Sandra, Berrendorf, Max, Kröger, Peer

论文摘要

反向k-nearest邻居（RKNN）查询是一种已建立的查询类型，从识别高度影响的对象而不是逐步更新KNN图到优化传感器通信和异常值检测，从而达到了各种应用程序。最先进的解决方案利用了现实世界数据集中的K-DISTANCES通常遵循幂律分布，并在日志gog空间中用线性线绑定它们。在这项工作中，我们研究了这一假设，并发现它在密度变化的区域被违反，我们证明这对于现实生活数据集很典型。对于通用解决方案，我们将k-distances的估计作为回归问题。因此，我们可以利用丰富的可用机器学习模型的力量，并从其进步中获利。我们提出了一种灵活的方法，该方法允许转向性能 - 内存消费折衷，尤其是在边缘计算的背景下找到具有固定内存预算至关重要的良好解决方案。此外，我们展示了如何获得和改善确切查询处理所必需的保证界限。在现实世界数据集的实验中，我们演示了该框架如何显着减少索引记忆消耗，并强烈降低候选设置的大小。我们在https://github.com/sobermeier/nonlinear-kdist上发布代码。

The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist.

下载PDF全文

下载文献需遵守相关版权规定

论文标题