论文标题
基于概念瓶颈的视频分类的自动概念提取
Automatic Concept Extraction for Concept Bottleneck-based Video Classification
论文作者
论文摘要
可解释的深度学习模型的最新努力表明,基于概念的解释方法通过标准的端到端模型实现了竞争精确性,并可以从图像中提取高级视觉概念的推理和干预,例如,从图像中识别鸟类特性分类的机翼颜色和喙长度。但是,这些概念瓶颈模型依赖于一组必要且充分的预定义概念,这对于诸如视频分类等复杂任务非常有用。对于复杂的任务,标签和视觉元素之间的关系涵盖了许多框架,例如,以各种抽象的方式识别鸟类飞行或捕获猎物不必要的概念。为此,我们提出了Codex,这是一种自动概念发现和提取模块,严格地构成了基于概念的视频分类的必要且充分的概念摘要集。 Codex从自然语言的自然语言解释中识别了一系列复杂的概念摘要,从而可以预先定义一组无定形的概念集。为了证明我们的方法的生存能力,我们构建了两个新的公共数据集,这些数据集将现有的复杂视频分类数据集与其标签的简短,众群体的自然语言解释相结合。我们的方法在自然语言中引发了固有的复杂概念摘要,以将概念 - 底层方法推广到复杂的任务。
Recent efforts in interpretable deep learning models have shown that concept-based explanation methods achieve competitive accuracy with standard end-to-end models and enable reasoning and intervention about extracted high-level visual concepts from images, e.g., identifying the wing color and beak length for bird-species classification. However, these concept bottleneck models rely on a necessary and sufficient set of predefined concepts-which is intractable for complex tasks such as video classification. For complex tasks, the labels and the relationship between visual elements span many frames, e.g., identifying a bird flying or catching prey-necessitating concepts with various levels of abstraction. To this end, we present CoDEx, an automatic Concept Discovery and Extraction module that rigorously composes a necessary and sufficient set of concept abstractions for concept-based video classification. CoDEx identifies a rich set of complex concept abstractions from natural language explanations of videos-obviating the need to predefine the amorphous set of concepts. To demonstrate our method's viability, we construct two new public datasets that combine existing complex video classification datasets with short, crowd-sourced natural language explanations for their labels. Our method elicits inherent complex concept abstractions in natural language to generalize concept-bottleneck methods to complex tasks.