论文标题
ZSCRGAN:基于GAN的期望最大化模型,用于从文本描述中零拍摄图像
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions
论文作者
论文摘要
跨模式信息检索的大多数现有算法都是基于有监督的火车测试设置,其中模型学会了将查询模式(例如文本)与文档模式(例如图像)从给定的培训集保持在线。这样的设置假定训练集包含所有可能的查询类别的详尽表示。实际上,可能需要在以前看不见的类中部署检索模型,这意味着零摄像机设置。在本文中,我们提出了一个基于零弹的文本的新型基于GAN的模型,以图像检索。当给出文本描述作为查询时,我们的模型可以在零拍设置中检索相关图像。使用期望最大化框架对所提出的模型进行了训练。多个基准数据集的实验表明,我们提出的模型舒适地胜过几个最先进的零照片文本,用于图像检索模型,以及零拍的分类和可用于检索的哈希模型。
Most existing algorithms for cross-modal Information Retrieval are based on a supervised train-test setup, where a model learns to align the mode of the query (e.g., text) to the mode of the documents (e.g., images) from a given training set. Such a setup assumes that the training set contains an exhaustive representation of all possible classes of queries. In reality, a retrieval model may need to be deployed on previously unseen classes, which implies a zero-shot IR setup. In this paper, we propose a novel GAN-based model for zero-shot text to image retrieval. When given a textual description as the query, our model can retrieve relevant images in a zero-shot setup. The proposed model is trained using an Expectation-Maximization framework. Experiments on multiple benchmark datasets show that our proposed model comfortably outperforms several state-of-the-art zero-shot text to image retrieval models, as well as zero-shot classification and hashing models suitably used for retrieval.