学会解释自己，何时可以：为概念瓶颈模型提供弃权的概念预测的能力

论文标题

学会解释自己，何时可以：为概念瓶颈模型提供弃权的概念预测的能力

Learn to explain yourself, when you can: Equipping Concept Bottleneck Models with the ability to abstain on their concept predictions

论文作者

Lockhart, Joshua, Magazzeni, Daniele, Veloso, Manuela

论文摘要

Koh等人的概念瓶颈模型（CBM）。 [2020]提供了一种方法，以确保基于神经网络的分类器仅基于人类可理解的概念的预测。 CBM的概念标签组成部分学到了概念标签或我们所指的理由。另一个组件学会从这些预测的概念标签中预测目标分类标签。不幸的是，这些模型非常依赖于每个数据点的人类提供的概念标签。为了使CBM在不容易获得这些标签时能够牢固地表现，我们展示了如何使它们能够避免在概念标签组件不确定时放弃预测概念的能力。换句话说，我们的模型学会了为其预测提供理由，但只有在确定理由是正确的时候。

The Concept Bottleneck Models (CBMs) of Koh et al. [2020] provide a means to ensure that a neural network based classifier bases its predictions solely on human understandable concepts. The concept labels, or rationales as we refer to them, are learned by the concept labeling component of the CBM. Another component learns to predict the target classification label from these predicted concept labels. Unfortunately, these models are heavily reliant on human provided concept labels for each datapoint. To enable CBMs to behave robustly when these labels are not readily available, we show how to equip them with the ability to abstain from predicting concepts when the concept labeling component is uncertain. In other words, our model learns to provide rationales for its predictions, but only whenever it is sure the rationale is correct.

下载PDF全文

下载文献需遵守相关版权规定

论文标题