论文标题
用Textual-PCA描述图像集
Describing Sets of Images with Textual-PCA
论文作者
论文摘要
我们试图用语义描述一组图像,捕获单个图像的属性和集合中的变化。我们的程序类似于原理成分分析,其中投影矢量的作用被生成的短语取代。首先,生成了与集合中图像的平均语义相似性最大的质心短语,在此,相似性的计算和生成都基于预审前的视觉模型。然后,使用相同的模型生成相似性分数之间产生最高变化的短语。下一个短语最大化的差异可能是在潜在空间中正交到最高差异短语,并且该过程仍在继续。我们的实验表明,我们的方法能够令人信服地捕获图像集的本质,并在整个集合的上下文中以语义有意义的方式描述单个元素。我们的代码可在以下网址提供:https://github.com/odedh/textual-pca。
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set. Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases. First, a centroid phrase that has the largest average semantic similarity to the images in the set is generated, where both the computation of the similarity and the generation are based on pretrained vision-language models. Then, the phrase that generates the highest variation among the similarity scores is generated, using the same models. The next phrase maximizes the variance subject to being orthogonal, in the latent space, to the highest-variance phrase, and the process continues. Our experiments show that our method is able to convincingly capture the essence of image sets and describe the individual elements in a semantically meaningful way within the context of the entire set. Our code is available at: https://github.com/OdedH/textual-pca.