商业：大规模商务多模式表示学习，并通过OMNI检索

论文标题

商业：大规模商务多模式表示学习，并通过OMNI检索

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

论文作者

Yu, Licheng, Chen, Jun, Sinha, Animesh, Wang, Mengjiao MJ, Chen, Hugo, Berg, Tamara L., Zhang, Ning

论文摘要

我们介绍了Commercemm-一种多式模型，能够提供与给定内容（图像，文本，图像 +文本）相关的商业主题的多样化和详细的理解，并且具有推广到多种任务的能力5在图像文本对上有效的预训练任务。为了使用文本到莫尔多尔，图像到莫尔多尔和多模式到莫尔多的映射来包含更常见和多样化的商业数据，我们提出了另外9个新颖的跨模式和跨对检索任务，称为Omni-RetRrecreval预培训。预训练以有效的方式进行，仅针对合并的14个任务进行两个前向/向后更新。广泛的实验和分析显示了每个任务的有效性。在结合所有预训练任务时，我们的模型在微调后实现了7个与商业相关的下游任务的最先进性能。此外，我们提出了一种新型的模态随机化方法，以在不同的效率约束下动态调整我们的模型。

We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc. We follow the pre-training + fine-tuning training regime and present 5 effective pre-training tasks on image-text pairs. To embrace more common and diverse commerce data with text-to-multimodal, image-to-multimodal, and multimodal-to-multimodal mapping, we propose another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training. The pre-training is conducted in an efficient manner with only two forward/backward updates for the combined 14 tasks. Extensive experiments and analysis show the effectiveness of each task. When combining all pre-training tasks, our model achieves state-of-the-art performance on 7 commerce-related downstream tasks after fine-tuning. Additionally, we propose a novel approach of modality randomization to dynamically adjust our model under different efficiency constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题