论文标题

MMLSH:一种用于处理多媒体数据上大约最近的邻居查询的实用和高效技术

mmLSH: A Practical and Efficient Technique for Processing Approximate Nearest Neighbor Queries on Multimedia Data

论文作者

Jafari, Omid, Nagarkar, Parth, Montaño, Jonathan

论文摘要

许多大型多媒体应用需要有效地处理最近的邻居查询。通常,多媒体数据表示为重要的高维特征向量的集合。现有的局部性敏感哈希(LSH)技术要求用户为代表查询对象的每个特征向量找到TOP-K相似的特征向量。由于两个主要原因,这导致浪费和冗余的工作:1)并非所有特征向量都可能在查找Top-K相似的多媒体对象方面同样有助于同样贡献,而2)在查询处理过程中独立处理特征向量。此外,还没有关于返回的多媒体结果的理论保证。在这项工作中,我们提出了一种实用有效的索引方法,用于使用称为MMLSH的LSH查找多媒体数据的TOP-K大约最近的邻居,该方法可以为返回的多媒体结果提供理论保证。此外,我们提出了一种具有缓冲区意识的策略,以加快查询处理。实验评估表明,与最先进的LSH技术相比,不同实际多媒体数据集的性能时间和准确性显着提高。

Many large multimedia applications require efficient processing of nearest neighbor queries. Often, multimedia data are represented as a collection of important high-dimensional feature vectors. Existing Locality Sensitive Hashing (LSH) techniques require users to find top-k similar feature vectors for each of the feature vectors that represent the query object. This leads to wasted and redundant work due to two main reasons: 1) not all feature vectors may contribute equally in finding the top-k similar multimedia objects, and 2) feature vectors are treated independently during query processing. Additionally, there is no theoretical guarantee on the returned multimedia results. In this work, we propose a practical and efficient indexing approach for finding top-k approximate nearest neighbors for multimedia data using LSH called mmLSH, which can provide theoretical guarantees on the returned multimedia results. Additionally, we present a buffer-conscious strategy to speed up the query processing. Experimental evaluation shows significant gains in performance time and accuracy for different real multimedia datasets when compared against state-of-the-art LSH techniques.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源