从时间连续的图像中学习上下文因果关系

论文标题

从时间连续的图像中学习上下文因果关系

Learning Contextual Causality from Time-consecutive Images

论文作者

Zhang, Hongming, Huo, Yintong, Zhao, Xinran, Song, Yangqiu, Roth, Dan

论文摘要

因果知识对于许多人工智能系统至关重要。常规的基于文本的因果知识获取方法通常需要费力且昂贵的人类注释。结果，他们的规模通常受到限制。此外，由于在注释期间没有提供上下文，因此所产生的因果关系知识记录（例如概念网）通常不会考虑上下文。为了探索一种获取因果关系知识的更可扩展的方式，在本文中，我们跳出了文本领域，并研究了从视觉信号中学习上下文因果关系的可能性。与纯粹的基于文本的方法相比，从视觉信号中学习因果关系具有以下优点：（1）因果关系知识属于常识性知识，该知识很少在文本中表达，但富含视频；（2）视频中的大多数事件都是自然订购的，这为我们提供了丰富的资源来挖掘因果关系知识；（3）视频中的所有对象都可以用作研究因果关系的上下文属性的上下文。详细说明，我们首先提出了一个高质量的数据集，然后进行实验，以证明使用良好的语言和视觉表示模型以及足够的培训信号，可以自动从视频中发现有意义的因果知识。进一步的分析还表明，如果我们想在实际应用中使用因果关系知识，那么考虑因素的上下文属性确实存在，并且视觉信号可以作为学习这种上下文因果关系的好资源。

Causality knowledge is crucial for many artificial intelligence systems. Conventional textual-based causality knowledge acquisition methods typically require laborious and expensive human annotations. As a result, their scale is often limited. Moreover, as no context is provided during the annotation, the resulting causality knowledge records (e.g., ConceptNet) typically do not take the context into consideration. To explore a more scalable way of acquiring causality knowledge, in this paper, we jump out of the textual domain and investigate the possibility of learning contextual causality from the visual signal. Compared with pure text-based approaches, learning causality from the visual signal has the following advantages: (1) Causality knowledge belongs to the commonsense knowledge, which is rarely expressed in the text but rich in videos; (2) Most events in the video are naturally time-ordered, which provides a rich resource for us to mine causality knowledge from; (3) All the objects in the video can be used as context to study the contextual property of causal relations. In detail, we first propose a high-quality dataset Vis-Causal and then conduct experiments to demonstrate that with good language and visual representation models as well as enough training signals, it is possible to automatically discover meaningful causal knowledge from the videos. Further analysis also shows that the contextual property of causal relations indeed exists, taking which into consideration might be crucial if we want to use the causality knowledge in real applications, and the visual signal could serve as a good resource for learning such contextual causality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题