ZSCRGAN：基于GAN的期望最大化模型，用于从文本描述中零拍摄图像

论文标题

ZSCRGAN：基于GAN的期望最大化模型，用于从文本描述中零拍摄图像

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

论文作者

Roy, Anurag, Verma, Vinay Kumar, Ghosh, Kripabandhu, Ghosh, Saptarshi

论文摘要

跨模式信息检索的大多数现有算法都是基于有监督的火车测试设置，其中模型学会了将查询模式（例如文本）与文档模式（例如图像）从给定的培训集保持在线。这样的设置假定训练集包含所有可能的查询类别的详尽表示。实际上，可能需要在以前看不见的类中部署检索模型，这意味着零摄像机设置。在本文中，我们提出了一个基于零弹的文本的新型基于GAN的模型，以图像检索。当给出文本描述作为查询时，我们的模型可以在零拍设置中检索相关图像。使用期望最大化框架对所提出的模型进行了训练。多个基准数据集的实验表明，我们提出的模型舒适地胜过几个最先进的零照片文本，用于图像检索模型，以及零拍的分类和可用于检索的哈希模型。

Most existing algorithms for cross-modal Information Retrieval are based on a supervised train-test setup, where a model learns to align the mode of the query (e.g., text) to the mode of the documents (e.g., images) from a given training set. Such a setup assumes that the training set contains an exhaustive representation of all possible classes of queries. In reality, a retrieval model may need to be deployed on previously unseen classes, which implies a zero-shot IR setup. In this paper, we propose a novel GAN-based model for zero-shot text to image retrieval. When given a textual description as the query, our model can retrieve relevant images in a zero-shot setup. The proposed model is trained using an Expectation-Maximization framework. Experiments on multiple benchmark datasets show that our proposed model comfortably outperforms several state-of-the-art zero-shot text to image retrieval models, as well as zero-shot classification and hashing models suitably used for retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题