视觉语言概念系统的跨模式对齐学习

论文标题

视觉语言概念系统的跨模式对齐学习

Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

论文作者

Kim, Taehyeong, Song, Hyeonseop, Zhang, Byoung-Tak

论文摘要

人类婴儿学习对象的名称，并在没有明确监督的情况下发展自己的概念系统。在这项研究中，我们提出了学习受婴儿单词学习机制启发的视力语言概念系统的方法。提出的模型了解了在线视觉对象和单词的关联，并逐渐构建了跨模式的关系图网络。此外，我们还提出了一种对齐的跨模式表示方法，该方法基于跨模式关系图网络以自我监督的方式学习视觉对象和单词的语义表示。它允许具有相同含义的不同模式的实体具有相似的语义表示向量。我们对我们的方法进行了定量和定性评估，包括对象映射和零摄像的学习任务，表明所提出的模型显着超过了基准，并且每个概念系统在拓扑上都是对齐的。

Human infants learn the names of objects and develop their own conceptual systems without explicit supervision. In this study, we propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms. The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks. Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks. It allows entities of different modalities with conceptually the same meaning to have similar semantic representation vectors. We quantitatively and qualitatively evaluate our method, including object-to-word mapping and zero-shot learning tasks, showing that the proposed model significantly outperforms the baselines and that each conceptual system is topologically aligned.

下载PDF全文

下载文献需遵守相关版权规定

论文标题