VQA和视觉推理：最新数据集，方法和挑战的概述

论文标题

VQA和视觉推理：最新数据集，方法和挑战的概述

VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges

论文作者

Zakari, Rufai Yusuf, Owusu, Jim Wilson, Wang, Hailin, Qin, Ke, Lawal, Zaharaddeen Karami, Dong, Yuezhou

论文摘要

近年来，人工智能（AI）及其应用引发了极大的兴趣。这项成就可以部分归因于AI子领域的进步，包括机器学习（ML），计算机视觉（CV）和自然语言处理（NLP）。深度学习是采用人工神经网络概念的机器学习的子场，使这些领域的增长最快。因此，视觉和语言的整合引起了很多关注。这些任务的创建方式使它们适当地体现了深度学习的概念。在本综述论文中，我们对艺术方法，关键模型设计原理进行详尽而广泛的审查，并讨论现有的数据集，方法，其问题提出和评估措施，以了解VQA和视觉推理任务，以了解视觉和语言表示学习。我们还提出了这一研究领域的一些潜在未来途径，希望我们的研究能够产生新的想法和新颖的方法来应对现有困难并开发新的应用程序。

Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题