端到端和神经符号视力语言推理系统之间的概括差异

论文标题

端到端和神经符号视力语言推理系统之间的概括差异

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

论文作者

Zhu, Wang, Thomason, Jesse, Jia, Robin

论文摘要

对于视觉和语言推理任务，完全连接主义者，端到端方法和混合动力，神经符号方法的分布性能都很高。每个范式在哪些分发设置中都能出现？我们通过四种类型的概括测试进行了有关单图像和多图像视觉询问的问题的研究：一种用于多图像查询，对比度集，组成概括和交叉基准传输的新型节段结构测试。在所有这些测试中，视觉和语言端到端训练的系统表现出相当大的性能下降。神经符号方法在从GQA到VQA的交叉基准转移方面遭受了更大的影响，但是它们在其他概括测试上显示出较小的精度下降，并且通过几次训练，它们的性能迅速改善。总体而言，我们的结果证明了这两个范式的互补益处，并强调了使用各种泛化测试套件以充分表征模型鲁棒性来偏移的重要性。

For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance. In which out-of-distribution settings does each paradigm excel? We investigate this question on both single-image and multi-image visual question-answering through four types of generalization tests: a novel segment-combine test for multi-image queries, contrast set, compositional generalization, and cross-benchmark transfer. Vision-and-language end-to-end trained systems exhibit sizeable performance drops across all these tests. Neuro-symbolic methods suffer even more on cross-benchmark transfer from GQA to VQA, but they show smaller accuracy drops on the other generalization tests and their performance quickly improves by few-shot training. Overall, our results demonstrate the complementary benefits of these two paradigms, and emphasize the importance of using a diverse suite of generalization tests to fully characterize model robustness to distribution shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题