论文标题
紧凑的深度聚合用于设置检索
Compact Deep Aggregation for Set Retrieval
论文作者
论文摘要
这项工作的目的是学习一组适合有效检索和排名的描述符的紧凑型嵌入,同时维持单个描述符的可区分性。我们专注于这个一般问题的特定示例 - 检索包含大型图像数据集的多个面孔的图像。在这里,该集合由每个图像中的面部描述符组成,并给出了多个身份的查询,然后,该目标是按顺序检索包含所有身份的图像,除了一个,\等 为此,我们做出以下贡献:首先,我们提出了一个CNN体系结构 - {\ em setnet} - 实现目标:它学习面部描述符及其在集合中的聚合,以产生用于设置检索的紧凑型固定长度描述符,并且图像的分数是与Query相匹配的图像数量的数量;其次,我们表明,这种紧凑的描述符的可区分性损失最小,每个图像最多两个面均可降解,然后缓慢降解 - 远远超过了许多基准。第三,我们探讨了使用此紧凑的描述符的速度与\找回质量权衡取回;最后,我们收集并注释了包含各种名人的大量图像数据集,我们将其用于评估并公开发布。
The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem -- that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities, all but one, \etc To this end, we make the following contributions: first, we propose a CNN architecture -- {\em SetNet} -- to achieve the objective: it learns face descriptors and their aggregation over a set to produce a compact fixed length descriptor designed for set retrieval, and the score of an image is a count of the number of identities that match the query; second, we show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that -- far exceeding a number of baselines; third, we explore the speed vs.\ retrieval quality trade-off for set retrieval using this compact descriptor; and, finally, we collect and annotate a large dataset of images containing various number of celebrities, which we use for evaluation and is publicly released.