大众感知哈希技术的锤击分布

论文标题

大众感知哈希技术的锤击分布

Hamming Distributions of Popular Perceptual Hashing Techniques

论文作者

McKeown, Sean, Buchanan, William J

论文摘要

基于内容的文件匹配已被广泛部署数十年，主要是为了检测侵犯版权，极端主义材料和滥用性媒体的来源。感知哈希（例如Microsoft的Photodna）是一种自动化的机制，可促进检测，使计算机可以以强大的方式近似图像或视频的视觉特征。但是，似乎对这种方法没有太多的公众评估，尤其是在涉及对媒体文件的效果的有效性时。在本文中，我们针对七个图像变体，对流行算法的几种知觉散列原型（包括Facebook的PDQ，Apple的Neuralhash和Poculting Phash Library）提出了数百万片的评估。焦点是无关图像和图像变体之间的锤击距离得分的分布，以更好地了解每种方法所面临的问题。

Content-based file matching has been widely deployed for decades, largely for the detection of sources of copyright infringement, extremist materials, and abusive sexual media. Perceptual hashes, such as Microsoft's PhotoDNA, are one automated mechanism for facilitating detection, allowing for machines to approximately match visual features of an image or video in a robust manner. However, there does not appear to be much public evaluation of such approaches, particularly when it comes to how effective they are against content-preserving modifications to media files. In this paper, we present a million-image scale evaluation of several perceptual hashing archetypes for popular algorithms (including Facebook's PDQ, Apple's Neuralhash, and the popular pHash library) against seven image variants. The focal point is the distribution of Hamming distance scores between both unrelated images and image variants to better understand the problems faced by each approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题